AI startups beyond OpenAI are making strides this week, sticking to their development plans while media attention swirls around the turmoil at OpenAI.
Take Stability AI, for instance. The company just unveiled Stable Video Diffusion, an innovative AI model capable of generating videos by animating existing images. Building on its popular Stable Diffusion text-to-image model, Stable Video Diffusion stands out as one of the few open-source video generation models available today.
However, access to Stable Video Diffusion is currently limited. The model is in what Stability describes as a “research preview.” Users interested in running it must adhere to specific terms of use, which delineate acceptable applications (such as “educational or creative tools” and “artistic design processes”) versus prohibited uses (like creating “factual or true representations of people or events”).
Given the history of AI research previews—Stability's included—I wouldn’t be shocked to see the model appear on the dark web soon. If that happens, there could be significant concerns about potential misuse, as there doesn’t seem to be any built-in content filtering. When Stable Diffusion debuted, it quickly became a tool for individuals with malicious intent, who used it to produce nonconsensual deepfake porn and other harmful content.
Stable Video Diffusion comprises two models: SVD and SVD-XT. The SVD model converts still images into 576×1024 videos over 14 frames, while SVD-XT enhances this to 24 frames, enabling both models to produce videos at a rate of three to 30 frames per second.
A whitepaper released with Stable Video Diffusion details that both models were initially trained on millions of video clips before being fine-tuned with a smaller dataset of hundreds of thousands to roughly one million clips. The origin of these clips is somewhat unclear—the paper suggests many came from publicly available research datasets—which raises questions about possible copyright issues. Should any training data be copyrighted, this could lead to legal and ethical complications for Stability and its users.
Regardless of the data source, both SVD and SVD-XT can generate relatively high-quality four-second clips. Based on the curated samples shared on Stability's blog, the outputs appear competitive with those from Meta’s recent video generation model, as well as examples from AI innovations by Google, Runway, and Pika Labs.
That said, Stable Video Diffusion has its limitations. Stability is open about these constraints; their Hugging Face pages indicate that the models struggle to generate videos without motion, execute slow camera pans, render legible text, or consistently depict faces and figures accurately. Nonetheless, it’s worth noting that Stability sees potential for future development, indicating the models can adapt to applications such as generating 360-degree object views.
So, where is Stable Video Diffusion headed? Stability intends to develop a “variety” of additional models that “build on and extend” SVD and SVD-XT, alongside a forthcoming “text-to-video” tool aimed at web-based text prompting. The overarching goal appears to be commercialization, as the company recognizes the model's applications in sectors like “advertising, education, entertainment, and more.”
Stability is clearly aiming for success, especially as investor expectations heighten. Reports in April indicated that Stability AI is experiencing financial strain, prompting a search for executives to enhance sales strategies. According to Forbes, the company has encountered delays in wage payments and payroll taxes, which led AWS—its cloud services provider—to threaten cutting off access to crucial GPU computing resources.
Recently, Stability AI secured $25 million through a convertible note, raising its total funding to over $125 million. However, it has yet to finalize new funding at a higher valuation; the startup was last valued at $1 billion and is reportedly seeking to quadruple that within months, despite facing low revenues and a significant cash burn.
Adding to Stability's challenges, Ed Newton-Rex, who served as VP of audio and was instrumental in launching Stability’s music-generating tool, Stable Audio, has departed the company. In a public resignation letter, he cited differences concerning the use of copyrighted data for training AI models.