By now, large language models (LLMs) like ChatGPT and Claude have become commonplace worldwide. Many individuals fear that AI may threaten their jobs. Ironically, most LLMs struggle with a simple task: counting the number of “r”s in the word “strawberry.” This limitation extends to counting “m”s in “mammal” and “p”s in “hippopotamus.” In this article, I will explore the reasons behind these shortcomings and suggest a straightforward workaround.
LLMs are advanced AI systems trained on extensive text data to comprehend and generate human-like language. They perform well in various tasks, such as answering questions, translating languages, summarizing information, and creating creative writing by predicting and constructing coherent responses based on input. These models excel in pattern recognition within text, enabling them to address a wide range of language-related tasks with remarkable accuracy.
Despite their capabilities, LLMs' inability to count “r”s in “strawberry” highlights that they do not think like humans; they process information differently. When engaging with LLMs about counting letters, the models rely on their pattern-matching abilities rather than logical reasoning.
Most high-performance LLMs utilize transformer architecture, which processes text through a method called tokenization. This technique converts text into numerical representations, known as tokens. Some tokens encompass entire words (e.g., “monkey”), while others represent parts of words (e.g., “mon” and “key”). This breakdown enables the model to predict the next token in a sentence effectively.
LLMs don’t memorize words but understand how tokens combine in various contexts, allowing them to guess subsequent words. For example, in “hippopotamus,” the model might see tokens like “hip,” “pop,” “o,” and “tamus,” failing to recognize the individual letters that compose the word.
While architectures that can analyze individual letters directly could mitigate this issue, current transformer models do not have the computational capacity to do so. Additionally, LLMs generate output by predicting the next word based on prior tokens, making them ill-suited for tasks like counting letters. When asked to determine the number of “r”s in “strawberry,” LLMs merely predict the answer based on the input format, rather than performing a direct count.
Here’s a Workaround
Although LLMs cannot logically reason, they excel in structured text comprehension. For instance, if you ask ChatGPT to use Python code to count the “r”s in “strawberry,” it will likely provide the correct answer. When LLMs are tasked with counting or other operations requiring logical reasoning, integrating programming language prompts can enhance their performance.
Conclusion
This simple letter-counting experiment exposes a core limitation of LLMs like ChatGPT and Claude. Despite their remarkable capabilities in generating human-like text, writing code, and answering diverse questions, these AI models do not possess human-like reasoning. This experiment clarifies that these models function as pattern-matching predictive algorithms rather than possessors of genuine intelligence. Understanding which prompts yield better results can help mitigate these limitations. As AI becomes increasingly integrated into our lives, recognizing its constraints is vital for responsible usage and setting realistic expectations.