OpenAI’s Latest Model, Sora, Can Create High-Quality Videos — Here’s What You Need to Know

OpenAI is stepping into the realm of video generation, joining the ranks of startups like Runway and major companies such as Google and Meta. Today, OpenAI has introduced Sora, an innovative generative AI model designed to create video content from text. Whether provided with a brief description or an elaborate prompt, Sora is capable of producing 1080p cinematic scenes featuring multiple characters, varied movements, and intricate background details, according to the company.

Sora also has the ability to "extend" existing video clips, attempting to fill in any gaps or missing information. "Sora possesses a profound grasp of language, allowing it to accurately interpret prompts and generate dynamic characters that showcase vivid emotions," OpenAI states in a recent blog post. "The model comprehends not only what you request in your prompt but also how these elements exist in the real world."

While OpenAI's promotional material for Sora may come off as somewhat embellished—such as the quoted statement—some selected samples from the model showcase impressive capabilities, especially when compared to other text-to-video technologies currently available.

Sora distinguishes itself by being able to create videos in various styles (such as photorealistic, animated, or black and white) lasting up to a minute—significantly longer than most available text-to-video models. Furthermore, these videos display a coherent narrative, avoiding what I refer to as "AI weirdness," where objects may move in unrealistic directions.

For instance, take a look at this virtual tour of an art gallery generated entirely by Sora (please overlook the graininess caused by my video-GIF conversion tool):

And here's an animation of a flower blooming:

However, some of Sora’s videos featuring humanoid subjects—a robot against a city skyline, for example, or a person strolling through a snowy path—do exhibit a somewhat video game-like quality, perhaps due to minimal background activity. Additionally, "AI weirdness" occasionally arises in many clips, such as cars unexpectedly reversing or arms appearing to melt into a duvet cover.

OpenAI, despite its commendable efforts, recognizes that Sora is not flawless. The company admits:

"Sora may encounter challenges in accurately simulating the physics of complex scenes and might misinterpret specific cause-and-effect relationships. For instance, a person may bite a cookie, yet the cookie may not show a bite mark afterward. The model can also confuse spatial cues, such as mixing up left and right, and may have difficulty accurately depicting events that unfold over time, including tracking a particular camera trajectory."

OpenAI positions Sora as a research preview and has disclosed limited information regarding the training data—around 10,000 hours of "high-quality" video—while withholding general access to the model. This caution stems from potential misuse; OpenAI rightly notes that individuals with malicious intent could exploit a model like Sora in numerous ways.

The company states it is collaborating with experts to identify potential vulnerabilities in the model and is developing tools to verify whether a video was created by Sora. Furthermore, if OpeanAI decides to turn this model into a publicly available product, it aims to include provenance metadata in the generated outputs.

"In our engagement with policymakers, educators, and artists worldwide, we strive to understand their concerns and explore positive applications for this emerging technology," OpenAI explains. "Despite conducting extensive research and testing, we cannot foresee all the positive and negative ways in which people will use our technology. This is why we believe that learning from real-world applications is essential for building increasingly safe AI systems in the future."

Most people like

Find AI tools in YBX