Meta’s Movie Gen Model Delivers Realistic Videos with Sound: Unlimited Moo Deng Awaits!

Home AI News Meta’s Movie Gen Model Delivers Realistic Videos with Sound: Unlimited Moo Deng Awaits!

Updated on October 20 2024

No one yet fully understands the practical applications of generative video models, but that hasn't stopped industry giants like Runway, OpenAI, and Meta from investing millions in their development. Meta's newest creation, Movie Gen, seamlessly transforms text prompts into fairly realistic videos complete with sound—though, thankfully, not voice just yet. Importantly, they have opted against a public release.

Movie Gen is essentially a collection of foundation models, with the most prominent being its text-to-video capability. Meta claims it surpasses competitors like Runway’s Gen3, LumaLabs’ latest offering, and Kling1.5. However, these comparisons often serve more to establish participation in the competitive landscape than to definitively crown Movie Gen as the victor. Detailed technical specifications are available in the research paper published by Meta, which outlines all its components.

Audio for the videos is generated to align with the visuals. For example, you might hear engine sounds matching car movements or the rumble of a waterfall in the background, complemented by a crack of thunder when appropriate. The system can even add music when it suits the scene.

Meta trained Movie Gen using "a combination of licensed and publicly available datasets," which they referred to as "proprietary/commercially sensitive," providing no further details. This likely includes a vast array of Instagram and Facebook videos along with various publicly accessible content vulnerable to scraping.

The ultimate goal for Meta isn't just to claim fleeting recognition as the “state of the art” but to establish a comprehensive method that allows for the creation of high-quality videos from simple, natural language prompts. For example, a user could input something like, “imagine me as a baker crafting a shiny hippo cake during a thunderstorm.”

A common challenge with video generators is their inflexibility in editing. If you request a video of someone walking across the street, any subsequent changes, such as altering their direction, can completely change the entire shot. Meta addresses this by introducing a straightforward text-based editing feature, allowing users to specify adjustments like, “change the background to a busy intersection” or “change her clothes to a red dress,” with the system aiming to implement just those modifications.

Camera movements are also recognized, meaning commands like “tracking shot” or “pan left” will be incorporated into the generated video. Although this still lacks the finesse of real camera control, it represents a notable improvement.

The model does have some unconventional limitations. It generates video at a width of 768 pixels, reminiscent of the now-outdated 1024×768 format but also compatible with various HD standards. Movie Gen upscales this to 1080p, which explains the claim of producing that resolution. While this claim isn't entirely accurate, the effectiveness of upscaling merits some leniency.

Interestingly, the model generates video for up to 16 seconds at 16 frames per second—a frame rate that has historically been unappealing. Alternatively, you can produce 10 seconds of video at a more standard 24 FPS, which is worth emphasizing.

As for the absence of voice generation, there are likely two factors at play. First, creating synchronized speech is significantly more challenging than generating audio. Matching speech to lip movements and corresponding facial expressions adds substantial complexity, making it a prudent choice to postpone this capability. An example like generating “a clown delivering the Gettysburg Address while riding a tiny bike in circles” could quickly turn into viral chaos.

The second factor appears to be a political consideration: launching a deepfake generator just ahead of a major election could create serious reputation issues. By limiting capabilities, Meta seeks to deter potential misuse, ensuring that any attempts by malicious actors would require significant effort. While it's feasible to combine this generative model with speech synthesis and lip-syncing technology, generating a candidate making outrageous claims is certainly not advisable.

“Movie Gen is currently a purely experimental AI research concept, and maintaining safety is our foremost priority, as it has been with all our generative AI technologies,” expressed a Meta representative in response to inquiries.

Unlike other models such as the Llama large language models, Movie Gen won't be publicly accessible. Users can replicate its methodologies by consulting the research paper, but the actual code will remain unpublished, aside from the “underlying evaluation prompt dataset,” which records the prompts used during the generation of test videos.

Gmail on iOS Now Lets Users Ask Gemini Questions About Their Emails

California's AI Training Transparency Law: Why Many Companies Are Hesitant to Confirm Compliance

Most people like

Chai

563.4K

Discover, create, and share innovative chatbots with Chai AI, the leading chatbot application. Engage with an expanding community where exploration meets creativity in the world of AI-driven conversations.

chatbot app AI Chatbot

Soundful

526.2K

Soundful empowers creators and artists to effortlessly generate and monetize an unlimited variety of music tracks, offering endless opportunities for musical expression and revenue generation.

AI music generator AI Music Generator

Conferbot

6.5K

Create No-Code Chatbots Effortlessly Are you looking to develop a chatbot but lack coding skills? Discover the power of no-code platforms that empower you to build sophisticated chatbots with ease. In this guide, we’ll walk you through the process of creating chatbots without writing a single line of code, making chatbot development accessible for everyone. Unlock the potential of automated conversations and enhance user engagement today!

chatbot AI Chatbot

Detail

33.7K

Detail is an innovative camera app designed to streamline video recording and editing for aspiring storytellers. With its user-friendly interface and powerful features, Detail empowers users to effortlessly create compelling visual narratives.

camera app AI Video Recording

Find AI tools in YBX