Why Does ChatGPT Struggle with Math? Exploring Its Limitations

If you’ve ever attempted to use ChatGPT as a calculator, you likely discovered its struggle with arithmetic—it often fumbles basic math. This issue isn’t exclusive to ChatGPT; other AI models face similar challenges as well.

Anthropic’s Claude falters at simple word problems, Gemini doesn't grasp quadratic equations, and Meta's Llama has difficulty with basic addition.

So, how is it possible that these advanced chatbots can produce eloquent text yet stumble over elementary calculations?

The answer lies partly in tokenization. This process, which involves breaking down data into manageable chunks (for example, dividing the word "fantastic" into its syllables "fan," "tas," and "tic"), aids AI in processing information. However, tokenizers—models that segment this data—struggle to accurately interpret numerical relationships. For instance, a tokenizer may treat "380" as a single unit but break down "381" into two segments: "38" and "1."

Yet, tokenization isn’t the sole reason why mathematics is a weak point for AI.

AI systems function as statistical models, learning from massive datasets to recognize patterns and make predictions. For instance, when faced with a multiplication problem like 5,7897 x 1,2832, ChatGPT leverages its extensive experience with similar queries. It might predict that numbers ending with “7” and “2” could yield a result finishing with “4.” However, the bot often miscalculates the entire problem. In my tests, ChatGPT gave an answer of 742,021,104 when the actual product is 742,934,304.

Yuntian Deng, an assistant professor specializing in AI at the University of Waterloo, conducted a comprehensive study evaluating ChatGPT’s multiplication skills. His findings revealed that the default GPT-4o model struggled with multiplication involving more than two four-digit numbers (e.g., 3,459 x 5,284).

“GPT-4o struggles with multi-digit multiplication, achieving less than 30% accuracy on problems with more than four digits,” Deng explained. “Errors made in any intermediate calculation can compound, leading to inaccurate final results.”

So, will ChatGPT ever master math? Is there a chance that it could rival human proficiency—or even a TI-84 calculator?

Deng remains optimistic. In his research, he also assessed OpenAI’s “reasoning” model, known as o1, which has been integrated into ChatGPT. The o1 model, which approaches problems step by step before providing answers, performed significantly better than GPT-4o, accurately solving up to half of nine-digit multiplication problems.

“The model might be tackling the problems in ways that differ from our manual methods,” Deng noted. “This raises intriguing questions about the model's internal logic and how it contrasts with human reasoning.”

Deng posits that advancements suggest certain math problems—especially multiplication—will eventually be “fully resolved” by AI models like ChatGPT. “This is a clearly defined task with established algorithms,” he added. “We’re observing noticeable improvements from GPT-4o to o1, indicating progress in reasoning capabilities.”

Just remember, you probably shouldn’t toss out your calculator just yet.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles