Google Veo: A Bold New AI-Generated Video Platform Launches at Google I/O 2024

Google is setting its sights on OpenAI's Sora with the launch of Veo, an innovative AI model capable of generating 1080p video clips approximately one minute long from a text prompt.

Introduced at Google’s I/O 2024 developer conference on Tuesday, Veo excels in capturing diverse visual and cinematic styles, including stunning landscape shots and captivating time lapses, while also allowing edits to previously generated footage.

“We’re investigating features like storyboarding and the generation of longer scenes to explore Veo’s full potential,” said Demis Hassabis, head of Google’s AI research division at DeepMind, during a virtual roundtable discussion. “Our progress in video technology has been remarkable.”

Veo builds upon Google’s initial commercial endeavors in video generation showcased in April, utilizing the Imagen 2 series of image-generating models to create brief looping video clips. In contrast to the Imagen 2-based tool, which produced low-resolution, short videos, Veo stands out as a strong competitor against other leading video generation models, including Sora, as well as offerings from startups like Pika, Runway, and Irreverent Labs.

During a briefing, Douglas Eck, head of generative media research at DeepMind, showcased handpicked examples of Veo's capabilities. One particularly striking example—a bird’s-eye view of a bustling beach—highlighted Veo’s advantages over competing models.

“The intricate details of all the beachgoers pose a challenge for both image and video generation models, especially with so many dynamic subjects,” Eck noted. “Upon close inspection, the waves appear impressive, and the vibrancy of the term ‘bustling’ is embodied by the lively beach filled with sunbathers.”

Training for Veo involved extensive footage, following the convention of generative AI models that learn from numerous examples in order to develop the ability to create new data, such as videos in Veo’s case. When pressed about the sources of the footage used to train Veo, Eck declined to provide specifics but acknowledged that some content might have come from Google’s own YouTube platform.

“It’s possible that Google models are trained on some YouTube material, but always in line with our usual agreements with YouTube creators,” he stated.

While the agreement aspect may hold true, the reality is that YouTube’s dominant position leaves creators with limited options but to conform to Google’s stipulations to maximize their audience reach.

Recent reports from The New York Times indicate that Google revised its terms of service last year to enable the company to access more data for training its AI models. Previously unclear on whether YouTube data could aid in developing products beyond the video service, the new terms offer significantly more latitude.

Google is not alone in utilizing vast quantities of user data to train its in-house models, as seen with other tech giants, including Meta. Nonetheless, some creators may find it disheartening that Eck claims Google is setting the standard for ethical practices in this domain.

“To address the challenge of training data effectively, collaboration among all stakeholders is essential,” Eck remarked. “Without a unified approach involving the film industry, music industry, and the artistic community, progress will be slow.”

Despite this, Google has already made Veo accessible to select creators, including Donald Glover (a.k.a. Childish Gambino) and his creative agency, Gilga. Similar to OpenAI’s Sora, Google positions Veo as a valuable resource for creative professionals.

Eck highlighted that Google offers tools allowing webmasters to restrict the company’s bots from utilizing their content for training purposes. However, these settings do not extend to YouTube, and Google does not provide a means for creators to exclude their work from its training datasets post-scraping.

In discussing concerns about generative AI's potential for regurgitation—a situation where a model produces an exact copy of its training data—Eck referenced instances where systems like Midjourney recreated specific frames from popular films such as “Dune” and “Star Wars,” causing potential legal complications for users. To address copyright concerns, OpenAI has taken measures to block specific terms in prompts for Sora.

When asked about how Google is minimizing regurgitation risks with Veo, Eck stated there are filters in place to eliminate violent and explicit content and that DeepMind’s SynthID technology will label videos generated by Veo as AI-produced.

“We plan to gradually roll out Veo to a limited number of collaborators to deeply understand the model's implications before expanding its availability,” he explained.

Eck also shared insights into Veo’s technical features, describing it as “highly controllable,” with the ability to understand camera movements and visual effects from prompts such as “pan,” “zoom,” and “explosion.” Like Sora, Veo demonstrates a grasp of basic physics, including fluid dynamics and gravity, enhancing the realism of the generated videos.

Moreover, Veo supports masked editing, allowing users to alter specific areas within a video, and it can generate videos from a single still image, akin to generative models like Stability AI’s Stable Video. Intriguingly, by following a series of prompts that create a narrative, Veo can also produce longer videos that exceed one minute.

However, it's important to note that Veo is not without flaws. Like other generative AI models, it can produce inconsistencies, such as objects vanishing and reappearing without rationale, and it occasionally miscalculates physics, resulting in implausible actions like cars reversing abruptly.

This is why Veo will be available only through a waitlist on Google Labs for the foreseeable future, hosted in a new platform called VideoFX, dedicated to generative AI video creation and editing. As improvements continue, Google intends to integrate Veo's features into YouTube Shorts and other products.

“This project is very much a work in progress and remains experimental... There’s still much to be done,” Eck noted. “However, I believe we’re laying the groundwork for groundbreaking advancements in the filmmaking landscape.”

To stay updated on the latest in AI, sign up for our newsletter and start receiving it in your inbox on June 5.

Most people like

Find AI tools in YBX