On September 24, ByteDance's Volcano Engine hosted an AI Innovation Tour in Shenzhen, unveiling two new AI models: Doubao Video Generation-PixelDance and Doubao Video Generation-Seaweed. This event showcased significant upgrades to several AI large models, setting a new standard for embracing the AI era. Attendees from Lei Technology noted the presence of four distinct exhibition zones highlighting the most popular AI applications for consumers: AI Adventures, AI Music, AI Assistants, and AI Bots, covering key areas such as entertainment, creativity, Q&A, and personalized AI solutions.
Among the showcased applications, the AI Bot garnered considerable attention as a new platform for developers, leveraging the powerful comprehension abilities of AI large models to help users create custom applications without needing programming skills. Users simply specify their requirements, and the AI handles the rest. For instance, Lei Technology recently launched Xiaolei Bot, which utilizes a vast repository of articles and evaluations accumulated over ten years to provide shopping advice and answer product-related questions.
In AI Adventures, users can assume a character role, interact with various elements of the story, and influence its development in real time, allowing every user to craft their own narrative. This technology not only enables users to create personal stories but also offers game developers innovative ideas for storyline creation, leading to a diverse array of narrative possibilities.
AI Q&A services, a well-known application of large models, have undergone multiple upgrades, now supporting context understanding of up to 256K and complex logical reasoning to cater to diverse user inquiries. And for those curious about AI Music, there’s more to explore soon.
In video creation, Volcano Engine, backed by ByteDance, stands at the forefront, especially with the global video creation trend sparked by TikTok. The video generation model developed by Volcano Engine emphasizes the user experience in creative processes and outcomes, going beyond mere visual generation.
Volcano Engine aspires for users to achieve near-photographic quality in their video production. To this end, extensive optimizations have been made to enable the model to handle complex instructions and produce dynamic camera movements similar to those used in professional filming. The model can replicate various cinematic effects, such as zooming, panning, and object tracking, allowing creators to present their ideas with unprecedented realism—a feature rarely seen in previous video generation models.
Additionally, the engine has addressed common issues in video production, such as sudden changes in clothing, accessories, lighting, and styles, which have historically troubled users. These discrepancies can significantly diminish the viewing experience, especially when even minor inconsistencies are perceived. By leveraging the multiple features of the DiT architecture, Volcano Engine's video generation model can tag generated visual elements and ensure continuity in later productions. Though absolute consistency isn't guaranteed, these advancements minimize noticeable shifts in style or appearance, effectively eliminating visual glitches.
Moreover, challenges such as multi-action commands and character insertion during scenes have been resolved. For instance, one demonstration featured a scene where a woman angrily looks to the side, puts on glasses, and a man enters the frame to embrace her. Unlike traditional filming, the AI must flawlessly handle facial expressions, multiple actions, and new elements to produce an acceptable video.
During the tour, Volcano Engine displayed numerous AI-generated videos produced with the Doubao Video Generation model, showcasing everything from multi-character interactions to extended action shots, all while maintaining a fluid experience and consistent thematic elements. This technology meets everyday creative needs, empowering anyone to create professional-quality videos from the comfort of their homes.
To cater to varied creative styles, Volcano Engine has implemented a deep optimization of the Transformer framework, significantly enhancing the Doubao video model's versatility. This model can now accommodate various styles, including 3D and 2D animation, traditional Chinese painting, black and white, and thick paint, while supporting multiple video formats, broadening its applicability across diverse fields.
Beyond the video model, the event also highlighted the upgrades of the Doubao universal model and music generation model, offering improved experiences across various domains. The music generation model, in particular, demonstrated its prowess by quickly producing catchy tunes and lyrics based on user requests.
Volcano Engine has successfully integrated the entire AI creation workflow: the universal model generates scripts, the image generation model sets the visual foundation, video and music generation models create the final content, and AI-assisted editing tools streamline the editing process, reducing barriers to video creation like never before.
Additionally, they introduced a new digital avatar generation application that creates lifelike digital humans within minutes, complete with voice cloning capabilities. This technology serves various applications, from live streaming and online teaching to customer service dialogues. The digital avatars can also switch languages seamlessly with an integrated model for real-time translation, a feature that attracted considerable interest given the shortage of multilingual hosts in the expanding overseas streaming market.
By expanding their offerings from video and music to digital avatars, Volcano Engine is constructing a comprehensive AI creative ecosystem, driving broader industry applications for AI technology. As these innovations continue to evolve, content creators and businesses alike will be empowered to discover more efficient methods for creation, heralding a new era of intelligent creative endeavors.
To accommodate the growing demand for AI models, Volcano Engine is continuously enhancing the capacity of its large models. While most models in the industry support a maximum of 300K tokens per minute (TPM), Doubao's large model has pushed this standard to 800K TPM, exceeding typical industry benchmarks by two to eight times while allowing flexible scalability and reduced computational costs.