Google's advanced text-to-image foundation model, Imagen 3, is now set to launch on the Vertex AI platform. This next-generation AI tool will be available for select customers in preview, offering developers faster image generation, improved prompt comprehension, more photorealistic depictions of people, and enhanced text rendering capabilities compared to previous versions.
Originally introduced at Google I/O in May, Imagen 3 began its journey with a private preview in ImageFX for select creators. Google's announcement confirmed that this powerful AI model would soon be accessible via Vertex AI.
Douglas Eck, senior research director at Google DeepMind, emphasized its capabilities, stating, “It’s our most capable image generation model yet. Imagen 3 is more photorealistic, richer in detail, and it minimizes visual artifacts. It comprehends prompts crafted in a natural, creative manner—detailed instructions yield the best results. Additionally, it excels at incorporating subtle details from longer prompts and improves text rendering, a persistent challenge in earlier image generation models.”
With the transition to Vertex AI, Imagen 3 introduces multi-language support, robust safety features such as Google DeepMind’s SynthID digital watermarking, and support for various aspect ratios.
Shutterstock, a leader in stock photography, has already integrated this model. Justin Hiza, vice president of data services at Shutterstock, remarked, “Since incorporating Imagen into our AI image generator, our users have created millions of images. We’re thrilled about the improvements Imagen 3 offers, allowing users to realize their ideas more quickly without compromising quality. This enhancement further solidifies Shutterstock’s commitment to an ethically-sourced AI image generator, ensuring safety and protection through Google Cloud’s indemnification for generative AI.”
While Google continues to evolve Imagen, it has not disclosed when its Gemini AI will resume image generation after facing criticism over inaccuracies. During a recent press briefing, Google Cloud CEO Thomas Kurian clarified the difference between the two models: “Gemini is a multimodal model designed to process diverse types of input, including images, video, and audio, enabling reasoning across these modalities. In contrast, Imagen is a diffusion model focused solely on generating high-fidelity text-to-image outputs. They serve distinct purposes.”
Questions about the timeline for re-enabling Gemini’s image functionality remain unanswered.