The United States is rapidly advancing in artificial intelligence (AI) by leveraging its hardware and software strengths. Recently, tech entrepreneur Elon Musk announced on social media that his AI startup, xAI, has begun AI training using a massive "Memphis Supercluster" made up of 100,000 H100 GPUs, claiming it to be the most powerful AI training cluster in the world. This raises the question: Should China follow the technological path set by the United States?
During the 2024 China Computing Power Development Expert Seminar, hosted by the China Intelligent Computing Industry Alliance and the National Information Standards Committee's Computing Power Standards Working Group, numerous experts shared their insights.
Chinese Academy of Sciences academician Chen Runsheng emphasized that "large AI models represent a new type of productivity, and the integration of large models with supercomputing is crucial. China must consider how to strategically position itself in this area." Zhang Yunquan, a researcher at the Institute of Computing Technology of the Chinese Academy of Sciences, pointed out the rapid development of large models and acknowledged the current computational power bottleneck in AI. Given China's strong technological foundation in supercomputing, he expressed hope that the integration of supercomputing and intelligent computing could resolve these challenges.
Shan Zhiguang, head of the Informationization and Industrial Development Department at the National Information Center, remarked that "Super Intelligent Integration has emerged from the diversification of computing power applications, exploring whether hybrid computing resources can meet varying application demands."
In predicting the future trajectory of super intelligent integration, academician Qian Depei outlined three clear stages: "for AI," "by AI," and "being AI." Each stage will progressively evolve from hardware to software, supporting and advancing AI technology.
In the "for AI" stage, the focus will be on upgrading existing computer systems and developing specialized hardware to efficiently support AI tasks. The "by AI" stage will see traditional computing transformed through AI methods, affecting both the approach to supercomputing problems and the structure of traditional computers. Finally, the "being AI" stage anticipates a future where computer systems will inherently possess intelligent characteristics, making AI a core attribute rather than an added capability.
Chen Runsheng noted that the scientific and industrial communities are actively trying to address the integration of supercomputing and intelligent computing. He cited NVIDIA’s latest GB200 architecture, which combines GPUs and CPUs to leverage the strengths of both computing types, although he expressed concerns that this approach does not fundamentally resolve efficiency issues.
Zheng Weimin from the Chinese Academy of Engineering highlighted that each phase of large model development—training, fine-tuning, and inference—relies heavily on computing power, which constitutes a significant portion of overall costs, particularly during the training phase. As such, computing power emerges as a critical factor in supporting the development of large models.
Experts emphasized that China’s approach should be tailored to its unique circumstances rather than strictly imitating the U.S. Qian Depei pointed out that while China has generated many large models, it continues to face constraints in hardware, and the quality and quantity of training data are often insufficient.
Chen Runsheng argued that significant advancements are needed in the foundational theories underpinning large models, as current developments only scratch the surface of potential improvements. He stressed the need to model human brain functionalities in intelligent computing, emphasizing that the human brain is remarkably efficient compared to current AI systems, which often have energy demands akin to entire cities.
Yuan Guoxing from the Beijing Institute of Applied Physics and Computational Mathematics stated that no single large model could meet the diverse needs across various industries. Different applications demand distinct technologies and algorithms with varied accuracy and computational power requirements.
As the U.S. imposes restrictions on high-end GPUs and collaboration opportunities, Zhang Yunquan pointed out the need for China to develop proprietary supercomputers suited for large models to bypass energy, reliability, and parallelism hurdles. He suggested leveraging China's extensive experience in supercomputing and substantial investments in intelligent computing to establish a framework responsive to the demands of large models, ensuring the country keeps pace with global advancements.
The plan for a "sovereign-level large model" proposes collaboration between national supercomputing institutes, leading universities, chip companies, and large model solution providers to create an open organization akin to OpenAI. This initiative aims to develop non-profit research and commercialize the sovereign-level large model.
Chen Runsheng concluded that given China's current conditions and the inevitability of advancing large models, it is unrealistic to rely solely on Western methods. Finding an independent path for developing sovereign-level large models is essential for China's future in AI.