Stability AI Unveils SVD 1.1: A Step Forward in AI Video Generation
Stability AI, renowned for its expanding suite of open-source AI models for content creation and coding, has announced an upgrade to its image-to-video latent diffusion model, known as Stable Video Diffusion (SVD).
Introducing SVD 1.1
The new version, SVD 1.1, is a refined iteration of SVD 1.0, optimized to generate short AI videos with improved motion and enhanced consistency. Tom Mason, CTO of Stability AI, confirmed that SVD 1.1 is now publicly available for download via Hugging Face. Additionally, it will be included in Stability AI’s subscription memberships, which vary in tiers for individuals and enterprises, starting from free to $20 per month. Commercial users will need a subscription for deployment, while research use remains open and free.
Enhanced Features of SVD 1.1
Launched in November 2023, Stability AI initially introduced two models for AI video generation: SVD, which created four-second videos with up to 14 frames from a still image, and SVD-XT, a fine-tuned version generating up to 25 frames. Building upon SVD-XT, the newly released SVD 1.1 also generates four-second videos with 25 frames at a resolution of 1024×576 when provided with a context frame of the same size.
Importantly, this upgrade aims for greater consistency in video output compared to earlier versions. Previous models occasionally struggled with photorealism, lacked motion, and faced challenges in generating realistic faces and people. SVD 1.1 seeks to resolve these issues, promising improved motion dynamics in the final outputs.
According to the company, "Fine-tuning for SVD 1.1 was conducted with fixed conditioning at 6 FPS and motion bucket ID 127 to enhance output consistency without the need for hyperparameter adjustments." While these settings are still adjustable, performance might differ outside these fixed conditions.
Performance and Future Prospects
Although Stability AI asserts enhancements with SVD 1.1, its real-world effectiveness remains to be evaluated. The Hugging Face page for the model emphasizes its research-oriented design and acknowledges that some challenges from previous versions may persist.
In addition to Hugging Face, the Stable Video Diffusion models are accessible via an API on the Stability AI developer platform, allowing developers to integrate advanced video generation capabilities seamlessly into their applications. The Stable Video Diffusion API generates four seconds of video at 24 FPS in MP4 format, yielding 25 generated frames along with interpolated frames. Features such as motion strength control and support for multiple layouts and resolutions—including 1024×576, 768×768, and 576×1024—enhance usability.
Looking Ahead
In 2023, Stability AI made significant strides in generative AI with frequent model updates, a trend that appears set to continue into 2024. The company, founded in 2019, has attracted considerable investment, including a $101 million funding round in 2022. However, it faces competition from other players in the AI video generation space, like Runway and Pika, both gaining traction with user-friendly web platforms that also offer video customization and upscaling.
Recently, Runway introduced the Multi Motion Brush feature, enabling users to animate specific areas of their AI videos. Similarly, Pika allows users to edit specific regions in videos, such as transforming a cow face into a duck's. Nevertheless, neither operates APIs for their models, which restricts integration into third-party applications.
As the landscape of AI video generation continues to evolve, Stability AI's SVD 1.1 marks an exciting advancement worth watching.