MAD AI - an agency guide to DALL·E 2 and AI text-to-image tools

This article was originally posted to LinkedIn

Since the launch of DALL·E 2 (by OpenAI) in April 2022, the Internet has been awash with images created using artificial intelligence “text-to-image” tools.

With the flood gate of public interest now open, the past three months have seen a wave of alternatives to DALL·E released to the public, or available in beta form, including Midjourney, Dalle Mini (since renamed Craiyon), Google Imagen and most recently Stable Diffusion.

Creative, marketing and digital agencies now have at their fingertips the ability to imagine and create unique and (generally) copyright-free images within minutes, using only a text description, to utilise in creative campaigns and user-interface designs for their clients.

Man holding mobile phone

Instead of researching for hours for the perfect stock photo, it’s now possible to type what you want, and generate the perfect image. Or, at least generate a perfectly mad image, which might be exactly what you need (example at right, with the prompt "stock photo of man wearing 1990's suit in office holding mobile phone").

Instead of hiring an illustrator, 3D artist or photographer, you can now write a 10 word description of the scene and have a robot spit out a dozen options for you to choose from. Unsurprisingly, there is plenty of controversy and debate about the impact of AI image generation on artists (good article on this topic here).

A major benefit of these tools for agencies is that they are great for fast ideation, for example creating a mood board to use when briefing a professional illustrator or photographer.

But, from my experiments with the various AI text-to-image software, none are (yet) suitable for adapting into a standard design project workflow.

Although the images are impressive, you generally don’t get what you exactly have in mind, it’s difficult to make changes to a particular aspect of an image, and the resolution is not high enough for a lot of design uses (although AI resolution upsizers can help).

AI Image tools are fantastic for creating imaginative artwork, without a defined brief. The results are beautiful, interesting, but not specific.

Another noteworthy limitation is the inability to create text. But you should try, as the results are often hilarious (example at right, with the prompt "Mad Men tv show as a robot").

This space is very much in its infancy, with many questions being asked about the impact on the jobs of commercial artists, and also the copyright ownership of images created by AI engines.

Although most of the tools offer commercial usage rights (more details on this below), there is doubt surrounding the copyright ownership of the dataset (images) that were used to train the AI algorithm. This is unchartered waters.

OpenAI (the research lab behind DALL·E) grant “full usage rights to commercialize the images”, so they are clearly very confident that users (and themselves) would not be sued into oblivion. And they can afford good lawyers, in 2019 OpenAI received a US$1 billion investment from Microsoft.

Below are five of the most popular AI text-to-image tools, with some notes from me on their availability, how they work and whether the images can be used commercially (the rules seem to change frequently, please read their T&C’s for the most up-to-date information).

Each tool is accompanied by some images I generated, to show how they differ. All test images (except Imagen) used the same two prompts: “A 3D render of a beautiful desk lamp on a desk, Octane Render, 4k” and “A handsome man wearing Ray Ban Aviators, studio photo, 4k”

Judge for yourself....

DALL·E 2

DallE images

The OG of text-to-image AI tools, DALL·E popularised the space when it launched it’s 2nd version (DALL·E 2) in April 2022, and is generally regarded as having the most impressive output.

The image quality of DALL·E is frankly mind-blowing. It can do nearly any style you can imagine of illustration, painting or photography. It can do (kind of) photo-realistic faces, and performs best when the scene is simple.

As DALL·E uses a Web interface, it includes image editing tools such as the ability to upload a source image, and do tricks such as Inpainting (edit inside an image) and Outpainting (expand, or edit outside of an image). These could be very helpful for commercial imagery.

An example of Inpainting can be seen in the Mad AI header image above. This was created by uploading a Mad Men image, erasing the head of Don Draper, and asking DALL·E to replace it with a 1960's robot head ("A 1960's robot head on a man's body").

Access - currently in Beta, with a wait-list. They have announced that they are progressively giving access to 1 million people.

Cost - Users are gifted free credits that refill every month, and you can buy additional credits (US$15 for 115 generations)

Commercial use - Users are granted “full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise”. Images can be downloaded at 1024x1024 resolution (2.2 MB PNG file)

Midjourney

Midjourney images

An alternative to DALL·E, the main benefit to Midjourney is that it’s available to everyone without a wait-list. The images are generally more “painterly” than photographic, and it creates beautiful and sometimes mind-bending art with the right prompts.

Midjourney doesn’t have a Web interface, and is accessed through a Discord channel with text-prompts being responded to by a Discord bot. Any images you create are publicly viewable, which might hinder use for commercial purposes. Working on a secret product launch campaign for Apple? Midjourney might not be for you.

Access - currently in Beta, with unrestricted public access via Discord.

Cost - Midjourney offers a free plan, or paid subscription tiers which offer more image generations and more flexible usage rights. The cheapest paid tier is a Basic Membership which costs USD$10 per month and allows for 200 images per month.

Commercial use - currently Midjourney says “You’re pretty free to use the images in just about any way you want”, and “you own all Assets you create with the Services”. Images can be downloaded from Discord and are generated as 1024x1024 resolution (1.3 MB PNG file - after Upscaling with the Discord Bot).

Stable Diffusion

Stable diffusion images

Stable Diffusion has stormed onto the AI image scene, boasting incredible image quality that is comparable in quality to DALL·E. Controversially, Stable Diffusion has plans to open source their models and weights, which will allow other software developers to build their own image generation tools.

The reason for the debate is that (unlike DALL·E) Stable Diffusion has far less filters and controls for the type of images that can be created. For example, you can ask for images of famous people to be generated - in close to photographic quality. It can also be used for NSFW imagery, and the scope for deep-fakes is immense.

Ethical concerns aside, as Stable Diffusion follows their roadmap, the entire text-to-image AI scene will explode with creative uses and applications.

Access - currently in closed Beta with a wait-list (apply here), and only available via a Discord channel for now. They have plans to launch a Web version soon, which will reportedly allow editing tools similar to DALL·E)

Cost - currently free to use

Commercial use - images are able to be used for commercial purposes, but are in the public domain and can be used by anybody. Images can be saved from Discord as PNG files with resolution 512x512 pixels (500Kb) default with ability to request higher resolution images within Discord to 1024x1024 (1.5 MB), but this negatively affects the image composition (i.e. gives strange results!).


Craiyon (formally DALL-E Mini)

Craiyon generated image

“DALL-E Mini” launched soon after the official DALL-E launch, and (unlike the official DALL-E) was publicly available with no wait-list.

Thanks to the open access, and the confusion over the name (people mistaking it for the real DALL-E), “Mini” received a heap of publicity and traffic. Due to this, image generation results are slow, often taking several minutes for one prompt.

Since renamed to Craiyon (after receiving a “polite” letter from OpenAI’s lawyers), the image quality is a poor imitation of the official DALL-E. It’s fun for first-timers experimenting, but hard to imagine professional designers and agency teams using Craiyon when the alternatives listed here generate more compelling and higher quality images.

Access - free to use, ad supported (lot’s of ads!)

Cost - free, with a premium tier planned. If using for commercial purposes, images have to be attributed to Craiyon. Presumably when the premium tier launches this will change.

Commercial use - Yes, but you must attribute images to Craiyon. There is also a Paid Commercial License for larger businesses (details on commercial use are not available in the Craiyon T&Cs). Images can be screenshot and saved as 1528 × 1860 (includes prompt text), 1.4 MB PNG file.



Google Imagen

Google Imagen example images

Imagen is Google’s answer to (Microsoft-funded) DALL-E. Imagen has a great name, and produces even greater images. The quality looks to be amazing, and according to Google’s (self-reported) benchmark challenge is even better than DALL-E. I don't doubt it.

Unfortunately for now we can’t test it ourselves, as Imagen is not open to the public, not even in private beta. We have to rely on the sample images on the Google Research Web site. If you work for Google Brain and want to give me Beta access, please reach out!)

Access - not available to the public

Cost - n/a

Commercial use - n/a

Summary

If you work in a marketing agency, I encourage you to experiment with the above tools, and start to think about how these could be used in your creative workflow, now or in the future.

The text-to-image AI space is moving at a blinding pace, and there will no doubt be a huge impact on any industry that relies on image creation and manipulation.

Previous
Previous

Dreamtweet AI image Twitter bot