Stability AI, renowned for its Stable Diffusion text-to-image generator, has launched its latest foundation model, Stable Video Diffusion (SVD). This model is now accessible via the company's developer platform and application programming interface (API), enabling third-party developers to integrate it into their apps, websites, and software solutions.
"This new addition provides programmatic access to a state-of-the-art video model tailored for various sectors. Our goal is to empower developers with an efficient method for seamlessly incorporating advanced video generation into their products," the company stated in a blog post.
While this release offers a powerful tool for enterprises aiming to create AI-generated videos, it also raises concerns. Stability AI has recently faced scrutiny for utilizing the LAION-5B dataset, which includes instances of inappropriate content and has since been removed from circulation.
Despite these challenges, Stability's SVD API offers a competitive edge in video quality. According to a LinkedIn post by the company, the SVD model can generate 2 seconds of video, including 25 generated frames and 24 frames of FILM interpolation, in just 41 seconds. Though this may not suffice for extensive video campaigns, it is beneficial for creating GIFs and specific messaging, including memes.
SVD competes with other video generation models from Runway and Pika Labs, which recently secured $55 million in funding and launched a new video editing platform. However, unlike Stability AI, these options are not available through an API, requiring users to access them directly on their respective websites or apps.
Additionally, Stability AI plans to launch a user-facing web experience for its video generator, encouraging users to join a waitlist for early access.
Understanding Stable Video Diffusion
Introduced in a research preview a month ago, Stable Video Diffusion allows users to create MP4 videos from still images such as JPGs and PNGs. Initial samples show that while the model can produce short clips lasting up to two seconds, it is still in developmental stages and offers less duration than some research-oriented models.
However, multiple short clips can be combined to produce longer videos. Stability AI claims that the model can be beneficial across sectors like advertising, marketing, TV and film, and gaming.
Importantly, the latest model can generate videos in multiple layouts and resolutions, including 1024×576, 768×768, and 576×1024. It also features motion strength control and seed-based generation, allowing for both repeatable and random outputs.
Navigating Controversy
While the launch of Stable Video Diffusion provides a streamlined way for businesses to integrate video capabilities, it underscores Stability AI's commitment to securing a foothold in the market amidst ongoing controversies regarding its training data sources.
Recently, a report from the Stanford Internet Observatory revealed that the LAION-5B dataset, used to train popular AI models, contained instances of inappropriate material, prompting its removal. Additionally, the company is facing a class-action lawsuit over its alleged acquisition of copyrighted images without permission for the creation of Stable Diffusion.
Currently, Stability AI's developer platform API offers access to all its models, including the Stable Diffusion XL text-to-image generator and the new SVD model. The company also provides a membership option for customers to host these models locally.