With the growing popularity of ChatGPT, the AI sector is once again in the spotlight. In China, a fierce competition known as the "Battle of the Models" has emerged, aimed at harnessing the capabilities of large models to discover new breakthroughs in AI technology. Among these efforts is MathEval, an authoritative benchmark centered on mathematical abilities, which has comprehensively evaluated 30 large models, drawing significant attention from the community.
After intense competition, Baidu's Wenxin Yiyan 4.0, Xueersi's Jiuzhang, and iFlytek’s Spark V3.5 emerged as the top three performers, showcasing their impressive prowess in the field of AI technology. Their remarkable achievements not only highlight their technical strengths but also set new industry standards, driving innovation and development in AI.
As of October last year, China has seen the emergence of over 200 large models utilized across various applications, particularly in mathematics. These models are playing an indispensable role in solving everyday math problems, conducting in-depth data analysis, and assisting in academic research and educational guidance. Both general-purpose and specialized large models are demonstrating formidable mathematical capabilities, injecting new vitality into multiple sectors.
To thoroughly assess the strengths of these large models in mathematics, the National New Generation Artificial Intelligence Open Innovation Platform partnered with several universities, including Jinan University and Beijing Normal University, to initiate the MathEval benchmark. This assessment aims to explore the problem-solving abilities of these models in arithmetic, competitions for junior and senior classes, and certain branches of advanced mathematics, providing a more precise and comprehensive evaluation standard for the application of large models in this field.
The MathEval project has compiled 19 mathematics assessment datasets since 2010, sourced from public data in top international AI conferences such as ACL, AAAI, and ICLR. These datasets cover a wide range of grade levels, question types, text formats, and difficulty levels, which are crucial for a holistic evaluation of mathematical abilities.
During the assessment process, the MathEval team rigorously tested the 30 large models, employing the advanced GPT-4 model for answer extraction and matching. This approach effectively minimized potential errors associated with rule-based evaluations, ensuring the accuracy and reliability of the results.
The Xueersi Jiuzhang model, focused on problem-solving and algorithmic support, achieved outstanding results in the evaluation. Xueersi’s research and development investment in this area has surpassed 1 billion yuan, demonstrating their strong capabilities and commitment to large model development. The success of the Jiuzhang model is not accidental; its exceptional problem-solving ability and professional algorithm support have distinguished it as a leader in the industry.
As a trailblazing enterprise dedicated to integrating advanced technology with education, Xueersi provides students with efficient and precise mathematical learning tools through the Jiuzhang model, making significant contributions to the innovation and development of mathematics education. Thanks to the ongoing efforts and relentless exploration by leading companies like Xueersi, China’s large models continue to break new ground, revealing diverse potential and possibilities. These companies are paving the way for the future development of large models in China, leaving a lasting impact on the industry.