I heard a lot about the new image generation models last week. So, I tested to see what’s improved. I gave the prompt below to various image generation models — old and new.
A Calvin and Hobbes strip. Calvin is boxing Hobbes, with a dialog bubble from Calvin, saying “Bring it on!”
Stable Diffusion XL Lightning

Stable Diffusion XL Base

Dall-E API

Runway ML

ImageGen 3

Dall-E 3 API

Ideogram 2.0

Flux.dev via Fal.ai

ChatGPT Plus

A few observations:
- Text generation has come a long way. The newer models have little problem generating clear text.
- Flux.1 seems to be the better of the newly released models
- But OpenAI’s ChatGPT seems to create as good an output as Flux.1
On the last point, it’s noteworthy that Dall-E-3 (the engine behind ChatGPT) gives a poor result. Clearly, prompting makes a difference. Here’s how ChatGPT modified my prompt to Dall-E-3.
A comic strip style image featuring Calvin, a young boy with spiky hair, standing in a playful boxing stance with oversized boxing gloves. He looks determined as he says ‘Bring it on!’ in a speech bubble. Facing him is Hobbes, a tall and slightly bemused tiger, also in a mock boxing pose with a gentle smile, as if humoring Calvin. The scene is set in Calvin’s backyard, typical of a Calvin and Hobbes comic, with a simple and uncluttered backdrop.
But just as clearly, prompting is far from the primary driver. Here’s the result of the above prompt on the Dall-E 3 API. The model ChatGPT is using behind the scenes seems to be a significant improvement over Dall-E 3.

The same detailed prompt does extremely well on ImageGen 3, though.

Update: 6 Oct 2024. Here’s what I get with meta.ai.

Update: 8 Oct 2024. Here’s what I got with Flux 1.1 Pro with the short prompt. (The detailed prompt gave me an error: “NSFW content detected in image. Try running it again, or try a different prompt.”)

Update: 4 Dec 2024. With Amazon Nova Canvas, here’s what the detailed prompt gave me.

The Calvin cartoon generated by ChatGPT is eerily close to the original!
It’s amazing to see what these bots can create when you ask them to bring a funny idea to life.
My last experiment brainstorming with the bots – https://mvark.blogspot.com/2024/08/let-ai-handle-overthinking.html
Pingback: My Year in 2024 - S Anand