Google Unveils Veo: A Stunning New Generative AI Video Model to Compete with OpenAI’s Sora

Since OpenAI introduced its Sora generative AI video creation model earlier this year, few competitors have matched its realism and quality—until now.

At its annual I/O developer conference, Google announced Veo, a new generative AI video model developed by its renowned DeepMind division.

Google describes Veo as capable of generating “high-quality, 1080p clips exceeding 60 seconds.” According to a post on DeepMind's X account, the model handles a variety of cinematic styles, from photorealism and surrealism to animation.

On its product page, Google states that Veo aims to “make video production accessible to everyone,” whether users are seasoned filmmakers, aspiring creators, or educators. Veo supports text-to-video, video-to-video, and image-to-video transformations.

In partnership with polymath artist Donald Glover, also known as Childish Gambino, Google tested Veo’s new features through his creative studio, Gilga.

Demonstrating Veo's impressive capabilities, DeepMind showcased several generated videos on its YouTube and X accounts, featuring scenes like a neon city, lifelike jellyfish, cowboys riding horses, spaceships exploring the cosmos, and human interactions. The results closely mimic live-action and skillfully-crafted animations, all created from simple text prompts.

In a blog post by Google VP Eli Collins and Senior Research Director Douglas Eck, Veo is highlighted for its “unprecedented level of creative control,” with a strong understanding of cinematic terms such as “timelapse” and “aerial shots.”

Moreover, Veo facilitates quick, high-quality edits to both AI-generated and user-uploaded videos, including pre-recorded footage. For example, users can input an editing command, like adding kayaks to an aerial coastline shot, and Veo can implement this seamlessly into the original video.

Veo also excels at maintaining consistency across video frames, addressing some inconsistencies typically found in other models, including Sora. It achieves this through advanced latent diffusion transformers, ensuring characters and objects remain cohesive and realistic.

To enhance its performance, Google improved the training data captions and utilized high-quality compressed video representations. This optimization boosts overall video quality and reduces generation time.

All videos generated by Veo are embedded with SynthID, Google’s content credentialing watermark, confirming their AI-generated status.

Veo represents years of DeepMind research, building on previous innovations such as Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere.

Currently, Veo is not publicly available. Following OpenAI's model with Sora, it is accessible to select creators through a private preview in VideoFX. Google plans to eventually integrate some of Veo’s features into YouTube Shorts and other products.

Most people like

Find AI tools in YBX