The text-to-image revolution, explained

How programmers turned the internet into a paintbrush. DALL-E 2, Midjourney, Imagen, explained.

Beginning in January 2021, advances in AI research have produced a plethora of deep-learning models capable of generating original images from simple text prompts, effectively extending the human imagination.


Researchers at OpenAI, Google, Facebook, and others have developed text-to-image tools that they have not yet released to the public, and similar models have proliferated online in the open-source arena and at smaller companies like Midjourney.


These tools represent a massive cultural shift because they remove the requirement for technical labor from the process of image-making.


Instead, they select for creative ideation, skillful use of language, and curatorial taste.


The ultimate consequences are difficult to predict, but, like the invention of the camera, and the digital camera thereafter, these algorithms herald a new, democratized form of expression that will commence another explosion in the volume of imagery produced by humans.


But, like other automated systems trained on historical data and internet images, they also come with risks that have not been resolved. 

Vox
vox