Are AI Models Destined to Hallucinate Forever?

Large language models (LLMs), such as OpenAI’s ChatGPT, face a common challenge: they often generate false information. These inaccuracies can range from harmless fabrications—like the bizarre assertion that the Golden Gate Bridge was transported across Egypt in 2016—to serious implications.

For instance, a mayor in Australia recently threatened legal action against OpenAI after ChatGPT incorrectly stated he had admitted guilt in a significant bribery case. Additionally, research has shown that LLM hallucinations can be exploited, enabling the spread of malicious code to unsuspecting software developers. Furthermore, LLMs frequently provide misleading mental health and medical advice, such as falsely suggesting that wine consumption can "prevent cancer."

This issue, known as hallucination, occurs due to the foundational methods used in developing and training contemporary LLMs and other generative AI models.

Understanding the Training of Models

Generative AI models do not possess genuine intelligence; they operate as statistical systems that predict words, images, speech, music, and other forms of data. By analyzing vast amounts of publicly available information, these models learn what data is likely to appear based on established patterns, including contextual relationships between data points.

For example, if given the email fragment “Looking forward...,” an LLM might complete it with “...to hearing back,” mimicking the patterns found in thousands of similar emails it has encountered during training. It does not imply the model has any anticipation or expectation.

Sebastian Berns, a Ph.D. researcher at Queen Mary University of London, explained that "the current training framework for LLMs conceals or 'masks' prior words for context, prompting the model to predict the next word." This technique is conceptually akin to using predictive text on smartphones, where users continuously select suggested next words.

While this probability-based strategy effectively generates coherent text on a large scale, it does not guarantee accuracy.

The Evolving Landscape of Language Models

LLMs can produce grammatically correct yet nonsensical output, as seen in the Golden Gate Bridge claim. They may also perpetuate inaccuracies present in their training data or merge information from conflicting sources, including fictional material, without recognizing the contradictions.

It's important to note that LLMs are not malicious; they lack understanding of truth and falsehood, having merely formed associations between various words and concepts, even if incorrect.

“Hallucinations stem from an LLM's inability to gauge the uncertainty of its predictions,” Berns noted. “Typically, an LLM is trained to always generate an output, even with inputs that diverge from its training data. An LLM lacks the capacity to ascertain whether it can reliably respond to a query.”

Addressing Hallucinations in LLMs

The pressing question is whether hallucinations can be resolved. Vu Ha, an applied researcher at the Allen Institute for Artificial Intelligence, believes that while LLMs will always hallucinate, there are practical methods to mitigate these occurrences based on their training and deployment.

For instance, Ha highlighted the potential of engineering a question-answering system with a curated, high-quality knowledge base to ensure accurate responses by integrating this knowledge base with an LLM for retrieval-like accuracy.

To illustrate, Ha compared outputs from Microsoft’s LLM-powered Bing Chat and Google’s Bard when asked, “Who are the authors of the Toolformer paper?” Bing Chat accurately listed all eight co-authors from Meta, while Bard mistakenly attributed the paper to researchers at Google and Hugging Face.

“Every deployed LLM will hallucinate,” Ha concluded. “The real issue is whether the benefits outweigh the downsides from these inaccuracies. For instance, if a model occasionally misidentifies dates or names without causing significant harm, it may still provide valuable assistance.”

Techniques for Reducing Hallucinations

Berns also mentioned a strategy known as reinforcement learning from human feedback (RLHF), which has seen success in minimizing hallucinations in LLMs. First introduced by OpenAI in 2017, RLHF trains an LLM and then gathers feedback to create a “reward” model, which is then used for fine-tuning the LLM’s responses.

In the RLHF approach, prompts from a predefined dataset generate new text via an LLM, and human annotators evaluate the output's helpfulness. This feedback informs the development of a reward model that assigns scores to responses based on perceived quality and utility, enabling further refinement of the LLM.

OpenAI utilized RLHF to train several models, including GPT-4. Nevertheless, Berns cautioned that RLHF isn't a flawless solution.

“I believe the range of possibilities is too vast to completely 'align' LLMs through RLHF,” Berns remarked. “A common approach in RLHF is to train a model to provide an 'I don’t know' response to challenging questions, predominantly depending on human expertise and hoping the model can apply this general knowledge to its domain. While this occasionally works, it may not always be reliable.”

Embracing Imperfection in LLMs

While the challenge of hallucination may seem daunting, Berns views it optimistically, suggesting that these models can spur creativity by acting as "co-creative partners." They can generate outputs that, while not entirely factual, can offer valuable insights and stimulate novel connections.

“Hallucinations present a problem when generated statements are factually erroneous or clash with social or cultural values—especially in contexts where individuals rely on LLMs for expertise,” he explained. “However, in creative or artistic endeavors, producing unexpected outputs can be advantageous, prompting new lines of thought that lead to unique ideas.”

Ha pointed out that we hold today's LLMs to an unrealistic standard, as humans also "hallucinate" by misremembering or misrepresenting facts. He argues that with LLMs, we often experience a disconnect because their outputs may appear correct initially but reveal errors upon deeper analysis.

“Simply put, LLMs, like all AI techniques, are imperfect and prone to mistakes,” he stated. “Typically, we are accepting of AI systems making errors, yet it's more complex when LLMs misstep.”

Ultimately, the solution may not reside in the technical operations of generative AI models. Considering the intricate nature of hallucination, a cautious approach that maintains a critical perspective on model predictions could be the most effective strategy moving forward.

Most people like

Find AI tools in YBX