How many times does the letter "r" appear in the word "strawberry"? According to advanced AI technologies like GPT-4o and Claude, the answer is two.
Large language models (LLMs) are capable of writing essays and solving equations in a matter of seconds, processing vast amounts of data faster than a human can flip through a book. However, these seemingly all-knowing AIs occasionally produce such astonishing errors that they become viral memes, reminding us there may still be a way to avoid submitting to our AI overlords.
The struggle of LLMs to grasp the nuances of letters and syllables underscores a crucial reality we often overlook: these systems lack brains. They do not think in the same manner as humans, nor can they replicate human-like intelligence.
Most LLMs are constructed using transformer architecture, a powerful form of deep learning. Transformer models segment text into tokens, which may consist of full words, syllables, or letters based on the specific model being used.
According to Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, "LLMs are grounded in this transformer framework, which does not actually involve reading text in the traditional sense. When you input a prompt, it translates into a numerical encoding. For instance, when it encounters 'the,' it recognizes its encoding but does not actually understand the letters 'T,' 'H,' and 'E' individually."
This limitation arises from the fact that transformers cannot effectively process or produce real text; instead, they convert text into numerical forms that carry contextual information to help generate coherent responses. Consequently, while the AI comprehends that the tokens "straw" and "berry" combine to form "strawberry," it cannot identify that "strawberry" consists of the specific sequence of letters: "s," "t," "r," "a," "w," "b," "e," "r," "r," and "y." Therefore, it remains unable to accurately identify the number of letters or the quantity of "r"s within "strawberry."
Addressing this issue is challenging, as it lies at the core of the LLM architecture itself.
The complexity increases as LLMs learn multiple languages. For instance, some tokenization methods assume a space between words, yet languages such as Chinese, Japanese, Thai, Lao, and Korean do not utilize spaces in this way. A 2023 study by Google DeepMind's Yennie Jun found that some languages may require up to ten times more tokens than English to express the same meaning.
"Allowing models to analyze characters directly without rigid tokenization might be best, though it's currently impractical for transformer models," Feucht explained.
In contrast to text generators like ChatGPT, image generators such as Midjourney and DALL-E utilize diffusion models. These models reconstruct images from noise, relying on extensive image datasets to recreate visuals akin to what they learned during training.
Asmelash Teka Hadgu, co-founder of Lesan and a fellow at the DAIR Institute, stated, “Image generators excel at creating recognizable artifacts, like cars and faces, but struggle with smaller details like fingers or handwriting.”
This discrepancy may arise because finer details are less frequently represented in training datasets compared to more prominent features, such as trees with green foliage. Improvements to diffusion models may be more achievable than those for transformer models. For instance, some image generators have become adept at rendering hands by training on additional images of real human hands.
“Just last year, various models produced poor representations of fingers, mirroring the text issue," Guzdial remarked. "While they may generate a hand with six or seven fingers that looks somewhat convincing, they frequently struggle to structure these elements cohesively."
Consequently, if you request an AI image generator to design a menu for a Mexican restaurant, you might see standard items like “Tacos,” but you could also encounter creative misspellings such as “Tamilos,” “Enchidaa,” and “Burhiltos.”
As these amusing memes about the spelling of “strawberry” circulate online, OpenAI is reportedly developing a new AI product, codenamed Strawberry, which may have enhanced reasoning capabilities. The effectiveness of LLMs has been hampered by the limited availability of training data. However, Strawberry is expected to generate synthetic data that can improve the performance of OpenAI's LLMs. According to reports, Strawberry can also tackle the New York Times’ Connections word puzzles—tasks that require innovative thinking and pattern recognition—and solve previously unseen math equations.
Meanwhile, Google DeepMind recently introduced AlphaProof and AlphaGeometry 2—AI systems aimed at formal mathematical reasoning—and claims these systems solved four out of six problems from the International Math Olympiad, a performance impressive enough to earn a silver medal at this prestigious event.
It’s a humorous coincidence that memes regarding AI’s difficulty spelling “strawberry” are making the rounds at the same time OpenAI is working on Strawberry. OpenAI CEO Sam Altman even seized the moment to share a picture of a remarkable berry harvest from his garden.