On September 24, ByteDance's Volcano Engine unveiled two new AI video models, marking the company's entry into the competitive AI video landscape. This sector has seen a collective push from major tech firms and startups alike. Notably, in late August, the AI startup MiniMax launched its own video model, followed by Alibaba's release of its Tongyi Wanshang video model on September 19. According to incomplete statistics from the Economic Observer, over 10 companies in China have introduced AI video products in the past four months alone.
ByteDance's late entry into this market was addressed by Volcano Engine President Tan Dai in an interview; he emphasized that the company does not feel pressured to be first. They view AI modeling as a long-term technology that could shape the next 10 to 20 years, hence their focus on quality and reliability.
The surge in AI video offerings reflects a growing recognition of consumer demand for video content across various sectors, including entertainment, e-commerce, and local services. Tan noted that video has become a core need for users. Similarly, MiniMax founder Yan Junjie highlighted that to achieve high user engagement and coverage, dynamic video content is essential, surpassing mere text-based offerings.
With its platforms Douyin and JianYing, ByteDance has a distinct advantage in the video domain, particularly due to its extensive content library. An AI video startup founder mentioned that their training data largely relies on overseas open-source data, AI-generated content, and licensed materials. Tan further pointed out that ByteDance's deep understanding of the video industry, bolstered by its technology, significantly enhances the capabilities of its Doubao video model. This integrated model accommodates various modalities—including text, music, video, and images—allowing for better comprehension of user commands.
What sets the Doubao video model apart is its ability to generate complex interactive scenes with multiple subjects and maintain content consistency during multi-camera transitions, offering a more polished output compared to typical AI-generated videos.
However, despite a growing number of entrants, the AI video sector's progress remains sluggish in terms of production quality and ease of use.