Why Does ChatGPT Struggle with Math? Exploring Its Limitations

Home AI News Why Does ChatGPT Struggle with Math? Exploring Its Limitations

Updated on October 19 2024

If you’ve ever attempted to use ChatGPT as a calculator, you likely discovered its struggle with arithmetic—it often fumbles basic math. This issue isn’t exclusive to ChatGPT; other AI models face similar challenges as well.

Anthropic’s Claude falters at simple word problems, Gemini doesn't grasp quadratic equations, and Meta's Llama has difficulty with basic addition.

So, how is it possible that these advanced chatbots can produce eloquent text yet stumble over elementary calculations?

The answer lies partly in tokenization. This process, which involves breaking down data into manageable chunks (for example, dividing the word "fantastic" into its syllables "fan," "tas," and "tic"), aids AI in processing information. However, tokenizers—models that segment this data—struggle to accurately interpret numerical relationships. For instance, a tokenizer may treat "380" as a single unit but break down "381" into two segments: "38" and "1."

Yet, tokenization isn’t the sole reason why mathematics is a weak point for AI.

AI systems function as statistical models, learning from massive datasets to recognize patterns and make predictions. For instance, when faced with a multiplication problem like 5,7897 x 1,2832, ChatGPT leverages its extensive experience with similar queries. It might predict that numbers ending with “7” and “2” could yield a result finishing with “4.” However, the bot often miscalculates the entire problem. In my tests, ChatGPT gave an answer of 742,021,104 when the actual product is 742,934,304.

Yuntian Deng, an assistant professor specializing in AI at the University of Waterloo, conducted a comprehensive study evaluating ChatGPT’s multiplication skills. His findings revealed that the default GPT-4o model struggled with multiplication involving more than two four-digit numbers (e.g., 3,459 x 5,284).

“GPT-4o struggles with multi-digit multiplication, achieving less than 30% accuracy on problems with more than four digits,” Deng explained. “Errors made in any intermediate calculation can compound, leading to inaccurate final results.”

So, will ChatGPT ever master math? Is there a chance that it could rival human proficiency—or even a TI-84 calculator?

Deng remains optimistic. In his research, he also assessed OpenAI’s “reasoning” model, known as o1, which has been integrated into ChatGPT. The o1 model, which approaches problems step by step before providing answers, performed significantly better than GPT-4o, accurately solving up to half of nine-digit multiplication problems.

“The model might be tackling the problems in ways that differ from our manual methods,” Deng noted. “This raises intriguing questions about the model's internal logic and how it contrasts with human reasoning.”

Deng posits that advancements suggest certain math problems—especially multiplication—will eventually be “fully resolved” by AI models like ChatGPT. “This is a clearly defined task with established algorithms,” he added. “We’re observing noticeable improvements from GPT-4o to o1, indicating progress in reasoning capabilities.”

Just remember, you probably shouldn’t toss out your calculator just yet.

AI Start-Up Poolside Secures $500M Funding from eBay, Nvidia, and Major Investors

Lawmaker of California’s Vetoed AI Bill SB 1047 Criticizes Silicon Valley for Its Failures

Most people like

Mintlify

Stunning documentation that captivates users and enhances their experience.

documentation AI Developer Docs

RevComm

Revolutionize your communications with an AI-powered IP phone featuring advanced conversation analytics. Enhance your business interactions by leveraging cutting-edge technology that transforms calls into actionable insights.

AI-powered AI CRM Assistant

Hug AI

Introducing an innovative AI tool that transforms your cherished photos into heartwarming hug videos. This easy-to-use platform brings your memories to life, allowing you to express love and support in a unique way through personalized visual storytelling. Create touching hug videos effortlessly and share joy with your loved ones!

AI video generator AI Photo & Image Generator

SteosVoice

SteosVoice: An AI-Driven Platform for Authentic and High-Quality Speech Synthesis Solutions.

speech synthesis AI Speech Synthesis

Find AI tools in YBX