In June last year, I did the first Generative AI Image Battle, comparing how various popular tools stack up. Now, a little over half a year later, I’m curious to see how far the same tools have come…read on to find out!
The AI Image Challenge
For this ‘battle’, the task is quite simple:
- Chosen Location – Bali (last time we did Thailand)
- Desired Image – Showcasing People and Location
- Objective – Create compelling visual content to use for destination marketing purposes.
- Success Criteria – Ease, Speed, Realism, Creativity, Usability
Sample Text to Image Prompts
I kept it simple and tried one prompt for each Generative Image tool:
- Beautiful Balinese Woman with folded hands, welcome to Bali, Cinematic, Centered Studio Portrait, 4k, Nature in background
Here’s what DALL E produced:
This time around, instead of using Bing Images, I decided to use Dall E 3, right from within ChatGPT, as it does a much better job of prompt adherence and producing compelling images.
Compared to past image generations, Dall E 3 does an excellent job of producing high fidelity images. They do take on a slight painting / rendering skew, but I suspect OpenAI has done that on purpose to avoid creating images that are “too real”.
Stable Diffusion
The latest version of Stable Diffusion is XL, which claims to be a leap forward in image generation.
Unfortunately Stability AI no longer offers this for free, so instead I tested the SDXL Turbo model, which produces images in near real time…but the quality and fidelity still leaves something to be desired.
Leonardo.ai
Leonardo has kept adding different models…and depending on the model you use, the quality and fidelity of the output can vary considerably. Below are two examples:
The above was generated their default “Deliberate 1.1” model and it has some wonkiness going on with hands in particular.
The next example uses their “Absolute Reality 1.6” model, which is an improvement, but still has some weirdness…see fourth image as an example.
The last one I tried used “Leonardo Diffusion XL”, which I suspect is Stable Diffusion XL under the hood…this definitely produced the most realistic output of them all, but still at fairly low resolution and with some artifacts.
MidJourney
Next, on to MidJourney v6.0 Alpha, which was released in late December. MidJourney has improved their own website, but the ability to generate images is still tied to Discord, though the feature is likely to be released sometime soon on their own site.
I tested both the default v6 mode as well as Style ‘Raw’, which seems to produce some pretty compelling images.
As you can see, MidJourney still rules the roost in terms of realism and image quality. Having the ability to vary images to different degrees, including editing regions of the image, zooming, plus upscaling from within the tool are also incredible features. Here’s one of the sample images, upscaled…
Adobe Firefly
Adobe now has Firefly Image 2 out as their latest text to image generation model. Here’s what it produced…pretty realistic (especially with the Hyperrealistic option chosen), however the images look nothing like a Balinese Woman.
Here’s the image that seemed closest to a Balinese woman (possibly more a tourist in Bali?), but prompt adherence isn’t great overall…note the folded arms (vs hands in a welcome).
Adobe adds Firefly mark to the images, which I think is great to identify Generative AI produced content. I suspect more tools will start doing this in meta data and watermarks.
I also gave Adobe Photoshop’s Generative Fill a try…which produced truly awful results. Generative fill seems to do a good job with tweaks and image expansion, but can’t seem to handle entire image generations from scratch.
Bonus Tool: NightCafe
I also hunted around for other tools…there’s plenty of them out there, but most of them seem to rely on SDXL as the underlying model. NightCafe was no different, producing the following output using Stable Diffusion:
The Verdict
I think the scores I gave in my last Generative AI Text to Image Battle still hold up pretty well. Most of these tools and models have evolved in the past 6-7 months but it is clear that when it comes to quality of image generation, Midjourney is still king and has improved even further in the past half a year.
I don’t think Adobe Firefly takes second spot anymore though…that deservedly goes to Dall E 3, which is amazing at prompt adherence (due to ChatGPT), but stays away from producing very real photographic images. Whether that changes in the future or not is anybody’s guess.
What tool are you a fan of? Any amazing tools I’ve missed?
Pingback: Generative AI Magic: See How 15 Timeless Fictional Characters Look Like In Real Life – Hotel Marketing, Technology and Loyalty