If you haven't yet heard of "Qwen2," that’s about to change with its recent release, which is set to redefine math applications in software development, engineering, and STEM fields.
What is Qwen2?
The landscape of AI is rapidly evolving, making it challenging even for tech enthusiasts to stay updated.
Qwen2 is an open-source large language model (LLM) from Alibaba Cloud, positioned as a competitor to OpenAI’s GPT series, Meta's Llama models, and Anthropic's Claude family. Launched under the sub-brand "Tongyi Qianwen" in August 2023, the Qwen family includes models like Qwen-7B, Qwen-72B, and Qwen-1.8B, showcasing 72 billion and 1.8-billion parameters, respectively. By early June 2024, Qwen2 arrived with a suite of five variants: 0.5B, 1.5B, 7B, 14B, and 72B. To date, Alibaba has released over 100 AI models in the Qwen family.
In its first year, professional adoption in China was impressive, with over 90,000 enterprises integrating Qwen models into their operations. Although these models achieved state-of-the-art performance initially, the fast-paced AI environment soon led to their performance being overshadowed—until now.
What is Qwen2-Math?
Today, Alibaba Cloud unveiled Qwen2-Math, a series of math-specific large language models designed specifically for English. Notably, the 72-billion parameter variant, Qwen2-Math-72B-Instruct, exhibits superior capabilities, outperforming the leading models, including OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and Google’s Math-Gemini Specialized 1.5 Pro.
For instance, the Qwen2-Math-72B-Instruct scored 84% on the MATH Benchmark for LLMs, which evaluates models using 12,500 challenging mathematics problems that often stump LLMs. An example of such a problem is determining which is greater: 9.9 or 9.11.
Interestingly, Qwen2-Math-72B-Instruct also excels in the grade school math benchmark GSM8K, achieving a remarkable accuracy of 96.7%, while performing at 47.8% proficiency on collegiate-level math.
While Alibaba did not directly compare Microsoft’s Orca-Math model, released in February 2024, it’s worth noting that Orca-Math (7 billion parameters) came close to Qwen2-Math-7B-Instruct with scores of 86.81% versus 89.9%. Even the smallest Qwen2-Math variant, with 1.5 billion parameters, scored impressively—84.2% on GSM8K and 44.2% on college math.
What Are Math AI Models Good For?
While initial applications of LLMs focused on chatbots for customer service and document drafting, math-specific LLMs aim to deliver reliable solutions for routine calculations and numerical problem-solving.
Despite being fundamentally rooted in mathematics, LLMs haven't always matched the reliability seen in previous AI generations for math problem-solving. The creators of Qwen2-Math aspire for their model to significantly aid in resolving complex mathematical challenges.
Notably, while the licensing terms for using Qwen2-Math are not fully open-source, they allow commercial use for enterprises with under 100 million monthly active users, enabling startups and many companies to utilize Qwen2-Math at no cost.
This innovative model is set to enhance the capabilities of professionals and businesses in tackling mathematical problems effectively.