Kuaishou Unveils Groundbreaking “Image-to-Video” Feature in Keling Model
Kuaishou recently upgraded its Keling generative model, introducing the innovative "Image-to-Video" feature. This function empowers users to convert static images into captivating 5-second videos while allowing control over movement effects through specific prompts. Moreover, the update includes a video continuation feature, enabling users to extend videos in 5-second increments, ultimately creating content up to 3 minutes long.
The "Image-to-Video" capability utilizes a sophisticated analysis of uploaded images, transforming them into dynamic 5-second clips. Leveraging its advanced 3D spatio-temporal attention mechanism, Keling excels in generating intricate motion scenarios, ensuring a smooth transition from still images to lively video content. The integration of prompt technology enhances the model's understanding of image semantics and user intentions, allowing for personalized, dynamic representations based on varied text inputs. This synergy between text and imagery significantly enhances Keling's storytelling potential.
Keling accommodates a wide range of input styles, supporting both realistic and stylized images across different aspect ratios. A notable aspect of this update is the video continuation feature, which allows users to expand their initial 5-second video—created through either "Text-to-Video" or "Image-to-Video" modes—into longer formats. This feature seamlessly integrates with the text control mechanism, fostering creative expression through diverse prompts and ensuring smooth transitions between video segments.
Launched on June 6, the Keling video generation model initially rolled out the "Text-to-Video" capability, demonstrating performance comparable to top-tier technologies. It is currently available for testing on the Kuaishou App, with around 140,000 users poised to engage with the feature.
Keling's potential extends to captivating applications such as the upcoming "AI Dance King," which will enable users to upload full or half-body photos to instantly generate dance videos. Additionally, the forthcoming "AI Singing and Dancing" feature will allow users to create both singing and dance videos from a single image, capturing facial expressions and body movements.
Developed by Kuaishou's AI team, the Keling model merges Sora-like technology with proprietary innovations, offering 1080p resolution videos lasting up to 2 minutes at 30 fps, along with customizable aspect ratios. Kuaishou continues to strengthen its AI capabilities, having introduced the large language model "Kuaiyi," featuring 175 billion parameters, as well as the text-to-image model "Ketu" and key innovations like Direct-a-Video, Video-LaViT, and I2V-Adapter.