On June 14, the Beijing ZhiYuan Conference, hosted by the ZhiYuan Institute at the Zhongguancun Display Center, featured a keynote by director Wang Zhongyuan, who discussed the institute's groundbreaking research in language models, multimodal systems, embodied intelligence, and biological computing. In a media interview, Wang highlighted the impressive advancements of China’s large models over the past year, noting that although they have reached a practical level, further enhancements are vital for future progress.
During the event, the ZhiYuan Institute launched a comprehensive suite of large models along with an updated full-stack open-source technology platform. To address the substantial computational demands of training large models, the institute partnered with China Telecom's TeleAI to introduce Tele-FLM-1T, the world’s first low-carbon, dense trillion-parameter language model. This model is part of the Tele-FLM series, which also includes existing 52B and 102B versions. Notably, the Tele-FLM series was developed with a focus on sustainability, consuming only 9% of typical industry computational resources during its training over four months, utilizing 112 A800 servers and processing a total of 2.3 trillion tokens.
To combat issues like model hallucination, the ZhiYuan Institute introduced the BGE series of semantic vector models, which utilizes retrieval-augmented generation (RAG) technology for accurate semantic matching across data. Consequently, the BGE series has become the most downloaded AI model in China.
In the multimodal domain, existing models often focus on specific tasks—such as Stable Diffusion for text-to-image and Sora for text-to-video—but tend to lack cohesive integration. In response, the ZhiYuan Institute unveiled Emu3, a next-generation multimodal world model designed for comprehensive, end-to-end functionality. The Emu3 employs proprietary multimodal autoregressive technology to facilitate joint training on images, videos, and text, greatly enhancing its capabilities for simultaneous input and output across formats. This allows for high-quality content generation, video continuation, and enhanced understanding of the physical world. The Emu3 model will be gradually open-sourced following security evaluations.
To foster global development in large model research, the ZhiYuan Institute launched FlagOpen 2.0, a full-stack open-source technology platform that supports heterogeneous chips and multiple deep learning frameworks. Wang Zhongyuan stated that the artificial intelligence landscape will transform significantly over the next two to three years. He views China's large models as still striving for parity with GPT-4, stating that achieving this benchmark is essential due to the models' impact on the industry derived from their generalized understanding, reasoning, and overall intelligence.
Reflecting on the past year, Wang noted that while large models are gaining traction in tech circles, everyday users have yet to fully experience their benefits. This is primarily due to current limitations in the capabilities of China's large models and an emerging B2B ecosystem. However, he remains optimistic, emphasizing that these models have achieved usability, and ongoing improvements in functionality are crucial.
When asked about potential breakthrough applications, Wang anticipated a significant rise in both B2B and B2C applications within the next few years, given the evolving maturity of China’s large models.