2024 Beijing Zhiyuan Conference Overview
On June 14-15, the "2024 Beijing Zhiyuan Conference" convened in Beijing, featuring key discussions on technological advancements. Aditya Ramesh, head of OpenAI Sora, provided insights into technology, while a dynamic dialogue unfolded between Li Kaifu, CEO of Zero One Infinity, and Zhang Yaqin, an academician from the Chinese Academy of Engineering. This significant gathering included participation from China's leading large-model startups.
Established in November 2018 with support from the Ministry of Science and Technology and the Beijing municipal government, the Zhiyuan Institute aims to advance research and development in AI. In 2023, leadership transitioned from Huang Tiejun to Wang Zhongyuan, former vice president of technology at Kuaishou. Dubbed the "AI Spring Festival Gala," the annual conference serves as a premier venue for discussing AI innovations.
During the conference, Kang Xiangwu, deputy director of the Strategic Planning Department at the Ministry of Science and Technology, emphasized that AI is at the onset of a collective technological shift, leading to a new era of multi-intelligent integration. While this transformation promises profound societal advancements, it also raises numerous security concerns. A major global discussion point is ensuring safety while fostering coexistence to ultimately benefit humanity.
Wang Zhongyuan highlighted the rapid progress of China's large model technology over the past year. He noted that, in 2023, the industry perceived Chinese models as lagging behind GPT-3.5. However, the current performance of Chinese models has surpassed GPT-3.5 and is nearing GPT-4 levels. In certain language contexts, Chinese models even outperform GPT-4, though the latter continues to evolve. For instance, the latest update, GPT-4o, features significant performance and efficiency enhancements, underlining that Chinese large models are still in a developmental phase.
Wang shared the Zhiyuan Institute's advancements in language models, multimodal models, embodied computing, and biocomputing. A key collaboration with China Telecom's AI Research Institute resulted in the world's first low-carbon, trillion-parameter language model, Tele-FLM-1T. To address issues like model hallucinations, the institute developed the BGE (BAAI General Embedding) series of semantic vector models. Additionally, they launched Emu3, a next-generation multimodal model aimed at creating a unified large model.
Wang stated that although Chinese large models are functional, they aren't yet fully user-friendly. Compared to GPT-4, these models can iterate rapidly in specific scenarios, but significant challenges persist, including limited computing resources and hurdles in core algorithms, especially when connecting over 10,000 GPUs.
The concept of Scaling Law emerged as a focal point of discussion. Li Kaifu explained that AI 2.0 signifies a technological and platform revolution, with Scaling Law playing a crucial role. This principle suggests that increasing computational power and data can continuously enhance the intelligence of large models—an approach confirmed by various sources but not yet entirely realized.
Yang Zhilin, CEO of Biyun Intelligent, pointed out that while scaling models is critical, data quality and availability remain challenges. Zhang Peng, CEO of Zhizhu AI, noted that although Scaling Law is a dynamic and effective principle, consensus on its maximum potential is lacking.
The conversation also ventured into the future of AI in the AGI era. Wang Zhongyuan suggested that the research focus has largely been on large language models, which are primarily single-language. However, the abundance of multimodal data—encompassing images, videos, and audio—could shift this trend. Enhanced multimodal models capable of understanding and interacting with the physical world may pave the way for significant advancements in embodied AI and AI in scientific contexts.
Aditya Ramesh stressed the importance of language models in creating intelligent systems with reasoning capabilities. He highlighted that integrating language with visual signals could simulate various phenomena, and increasing model size might lessen reliance on language. Recent innovations in this domain include Luma AI's Dream Machine for video generation and Kuaishou's Kling large model.
Ramesh also voiced concerns about the societal implications of video generation models, advocating for responsible use of Sora to combat misinformation. The Sora team is committed to enhancing controllability and minimizing randomness based on collaborative feedback.
AI safety was another prominent topic during the conference, with Yang Zhilin asserting the need for preparedness in AI safety due to the rising computational demands of Scaling Law. He highlighted the dual nature of AI safety: addressing potential malicious uses and ensuring the foundational principles governing model behavior. Li Dahai observed that current large models operate as read-only systems, but as models begin to adjust their weights dynamically—particularly in robotics—safety concerns will escalate.
In light of recent market competition, Wang Xiaochuan noted that decreasing prices are enabling broader participation from individuals and businesses, prompting many firms to pivot toward model usage rather than development, thereby reducing resource wastage.
Looking ahead, Wang Zhongyuan anticipates a surge in practical applications of Chinese large models within the next two to three years as these models mature. He reiterated the need for patience to achieve successful consumer applications, drawing parallels to the mobile internet boom. The industry remains optimistic about breakthroughs in consumer applications, recognizing that innovation thrives under conducive conditions and robust technological capabilities.