Meta Advances AI Technology for Movie Creation

Like “Avengers” director Joe Russo, I’m becoming increasingly convinced that fully AI-generated movies and TV shows will be a reality in our lifetime. Recent advancements in artificial intelligence, especially OpenAI's remarkably realistic text-to-speech technology, have provided exciting glimpses into this innovative frontier. However, today’s announcement from Meta has particularly sharpened my focus on the future of AI-generated content.

This morning, Meta unveiled Emu Video, an advanced version of its image generation tool, Emu. By inputting a caption (like “A dog running across a grassy knoll”), an image, or a photo with a description, Emu Video can produce a four-second animated clip.

These clips can be further refined using another AI model named Emu Edit, also introduced today. Users can make adjustments in natural language — for example, “the same clip, but in slow motion” — and witness their changes in a newly rendered video.

While video generation technology isn’t new, as previous experiments from Meta and Google confirm, startups like Runway are already leveraging it in their business models. However, Emu Video's 512x512, 16-frames-per-second clips stand out due to their remarkable fidelity, making it challenging for my untrained eye to distinguish them from reality.

That said, Emu Video excels primarily with simple, static scenes (like waterfalls or timelapses of city skylines) that fall short of photorealism, instead adopting styles such as cubism, anime, paper cut craft, and steampunk. One particular clip of the Eiffel Tower at dawn depicted in a painterly style, reflected beautifully in the River Seine, reminded me of an e-card from American Greetings.

Even in its best renditions, however, Emu Video has its quirks—featuring odd physics, like skateboards that hover parallel to the ground, and bizarre appendages where toes curl behind feet and legs meld into one another. Objects often appear and vanish erratically, too, as seen with the birds in the Eiffel Tower clip.

After exploring many of Emu Video’s creations (or the curated selections Meta provided), I noticed another key issue: the subjects in the clips often lack dynamic action. For instance, an anthropomorphized raccoon in one clip holds a guitar but does not strum it, despite the caption suggesting it should. Similarly, two unicorns appear “interested” in a chess game but do not move any pieces.

Clearly, improvements are needed. Nevertheless, Emu Video’s simpler b-roll could seamlessly fit into contemporary movies or TV shows — a reality that raises serious ethical concerns for me.

The risks of deepfakes aside, I worry about the livelihoods of animators and artists who create the kinds of scenes that AI like Emu Video can now imitate. While Meta and its generative AI competitors may argue that Emu Video is designed to augment, not replace, human artists — a view expressed by Meta CEO Mark Zuckerberg regarding its integration into Facebook and Instagram — I find that perspective overly optimistic, especially when financial factors are in play.

Earlier this year, Netflix employed AI-generated backgrounds in a three-minute animated short, claiming it could alleviate the anime industry’s labor shortage, while ignoring the low pay and harsh conditions driving artists away from the field. In another incident, the studio responsible for the opening sequence of Marvel’s “Secret Invasion” acknowledged the use of AI, primarily the text-to-image tool Midjourney, to craft much of the artwork. Despite the director’s justification regarding the show’s themes, the artist community and fans strongly opposed this approach.

Actors may also face uncertainty, as the recent SAG-AFTRA strike highlighted concerns about the use of AI to create digital likenesses. While studios ultimately agreed to compensate actors for AI-generated replicas, there’s a possibility they may reconsider as the technology advances.

Compounding these challenges is the fact that AI tools like Emu Video are generally trained on content created by artists, photographers, and filmmakers without their consent or compensation. Meta’s whitepaper accompanying Emu Video merely states that the model was trained on a dataset of 34 million “video-text pairs” ranging from five to sixty seconds, without specifying the origin, copyright status, or licensing of these videos. (After this article was published, a Meta spokesperson clarified that the model was trained on “data from licensed partners.”)

Efforts to establish industry-wide standards that would allow artists to opt-out of AI training or receive payment for their contributions are still in nascent stages. But as Emu Video demonstrates, technological innovation often races ahead of ethical considerations—perhaps it's already too far ahead.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles