Researchers Challenge AI's "Reasoning" Skills: Models Struggle with Simple Math Problems Due to Minor Changes

Home AI News Researchers Challenge AI's "Reasoning" Skills: Models Struggle with Simple Math Problems Due to Minor Changes

Updated on October 19 2024

How do machine learning models function, and do they "think" or "reason" as humans do? This question straddles both philosophical and practical realms. A newly circulated paper suggests a clear answer, at least for now: “no”.

A team of AI researchers from Apple has published a paper titled “Understanding the Limitations of Mathematical Reasoning in Large Language Models,” which has sparked discussions since its release on Thursday. While the intricate details delve into symbolic learning and pattern recognition, the core findings are quite straightforward.

Consider this simple math problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he picked on Friday. How many kiwis does Oliver have?

The answer is simple: 44 + 58 + (44 * 2) = 190. Although large language models (LLMs) occasionally struggle with arithmetic, they typically handle straightforward questions like this well. But what happens when we introduce an irrelevant detail?

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

You might think it's the same math problem. A child would understand that a small kiwi is still a kiwi. Yet, this additional information trips up even the most advanced LLMs. For instance, here's how GPT-o1-mini responds:

“On Sunday, 5 of these kiwis were smaller than average. We need to subtract them from the Sunday total: 88 (Sunday’s kiwis) – 5 (smaller kiwis) = 83 kiwis.”

This is just one example among many questions that the researchers altered slightly, leading to significant declines in success rates for the models tackling them.

Why is this the case? Why do models that seem to understand a problem get flustered by a minor, irrelevant detail? The researchers argue this consistent failure suggests these models don't truly comprehend the problem. While their training data may lead them to the correct answer in some cases, they struggle to “reason” when faced with even the slightest complexity, like deciding whether to count smaller kiwis, resulting in unexpected outcomes.

The researchers state in their paper: “We investigate the fragility of mathematical reasoning in these models and demonstrate that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize this decline arises because current LLMs lack genuine logical reasoning; they merely attempt to replicate the reasoning steps found in their training data.”

This observation aligns with attributes commonly ascribed to LLMs. When “I love you” is statistically followed by “I love you, too,” the model can easily mimic that response—but it doesn’t imply it truly feels love. While it can navigate complex reasoning chains previously encountered, this ability falters with even minor deviations, suggesting it replicates observed patterns rather than genuinely reasons.

Co-author Mehrdad Farajtabar effectively summarizes the paper in a thread on X.

An OpenAI researcher expressed respect for Mirzadeh and colleagues’ work but questioned their conclusions. They argued that correct responses could be achieved through thoughtful prompt engineering. Farajtabar, responding with the collegiality typical of researchers, noted that improved prompting might help with simple modifications but could require significantly more context to address complex distractions—ones a child would easily identify.

So, do LLMs reason? Perhaps. Can they reason? No one knows for sure. These concepts aren’t clearly defined and often emerge at the forefront of AI research, where advancements occur daily. It’s possible LLMs “reason” in ways we don’t yet understand or control.

This topic presents a fascinating area of research, but it also raises important questions about how AI is marketed. Can these systems deliver on their promises? And if they can, how exactly do they achieve this? As AI becomes an integral part of daily software, these inquiries transition from academic discussions to real-world considerations.

Anthropic CEO Embraces Techno-Optimism in 15,000-Word Tribute to AI Advancements

Discover the 39 U.S. AI Startups That Have Secured Over $100M in Funding in 2024

Most people like

Lingolette

In today’s fast-paced world, effective communication is essential, making spoken fluency a critical skill for learners. A language teaching machine designed specifically for improving spoken fluency can revolutionize the way individuals practice and refine their speaking abilities. By combining advanced technology with tailored learning techniques, this innovative tool helps users gain confidence and proficiency in their spoken language, making it an invaluable asset for both educators and learners alike. Discover how this cutting-edge machine can transform your language journey and elevate your conversational skills to new heights.

language learning AI Chatbot

Experience ChatGPT Free Online with OpenAI ChatGPT 4o

Discover the best ChatGPT alternatives for enhanced AI chat capabilities! Engage in dynamic conversations and elevate your communication experience with advanced chat solutions tailored to your needs. Whether you're seeking innovative features or superior performance, explore these powerful options for transformative interactions.

ChatGPT alternative Large Language Models (LLMs)

uBrand

In today’s fast-paced digital landscape, leveraging artificial intelligence (AI) has become essential for creating compelling and distinctive brands. By integrating AI into your branding strategy, you can streamline processes, gain insightful consumer data, and enhance customer experiences. Discover how to effectively harness AI technology to position your brand for success and stay ahead of the competition.

AI AI Graphic Design

Pepper Content

Introducing an AI-Powered Content Marketing Platform Unlock the potential of your marketing strategy with our cutting-edge AI-driven content marketing platform. Designed to streamline your content creation and distribution, our platform harnesses the power of artificial intelligence to deliver engaging and relevant content tailored to your audience. Transform how you connect with customers, boost your brand visibility, and drive conversions effortlessly. Discover the future of content marketing today!

AI-driven AI SEO Assistant

Find AI tools in YBX