Why RAG Alone Won't Solve the Hallucination Issue in Generative AI

Hallucinations in Generative AI: Tackling the Challenges for Businesses

Hallucinations—essentially the inaccuracies or falsehoods produced by generative AI models—pose significant challenges for businesses aiming to incorporate this advanced technology into their operations. Since these models lack genuine understanding, they merely predict words, images, speech, and other forms of data based on their internal structures. Consequently, they can produce errors that are sometimes glaring. For instance, in a recent article from The Wall Street Journal, it was reported that Microsoft's generative AI mistakenly created fictitious meeting participants and misrepresented the topics discussed during conference calls.

As previously discussed, hallucinations might be an inherent limitation of contemporary transformer-based model architectures. However, several generative AI providers assert they can mitigate, if not eliminate, these inaccuracies through a technique known as retrieval-augmented generation (RAG).

Take a look at how Squirro describes its approach:

“Our solution centers around Retrieval Augmented LLMs, or Retrieval Augmented Generation (RAG). What sets our generative AI apart is our promise of zero hallucinations. Every piece of information generated is traceable to a credible source.”

Similarly, SiftHub emphasizes its innovation:

“With RAG technology and industry-specific large language models, SiftHub empowers companies to generate personalized responses without hallucinations. This approach ensures enhanced transparency, reduced risk, and fosters complete trust in leveraging AI for various applications.”

RAG was pioneered by Patrick Lewis, a data scientist associated with Meta and University College London, who was the lead author of the foundational 2020 paper on the subject. This technique involves retrieving potentially relevant documents—like a Wikipedia entry on the Super Bowl—via keyword searches before prompting the model to generate informed answers based on the additional context provided.

David Wadden, a research scientist at AI2 (the AI-focused research division of the Allen Institute), elaborates: “When using a generative AI model such as ChatGPT or LLaMA, the model typically responds from its parametric memory, which encompasses the knowledge garnered from extensive web-based training. However, just as having a reference material at hand improves our accuracy, a similar principle applies to AI models.”

RAG proves beneficial as it allows attribution of model outputs to retrieved documents, enhancing factual integrity and mitigating the risk of copyright infringement. Additionally, it enables organizations—especially those in tightly regulated sectors like healthcare and legal—to utilize internal documents securely and temporarily without permanently training a model on them.

Nonetheless, it's crucial to understand that RAG alone cannot eliminate hallucinations entirely and has inherent limitations often overlooked by vendors. Wadden points out that RAG is particularly effective in “knowledge-intensive” scenarios where users seek specific information—like last year's Super Bowl winner—since such documents likely contain pertinent keywords, facilitating easier retrieval.

Conversely, “reasoning-intensive” tasks—such as coding or mathematical queries—pose more complex challenges, as it's difficult to articulate the necessary concepts through keyword searches, complicating the identification of relevant documents.

Even in straightforward inquiries, AI models can become “distracted” by off-topic content in lengthy documents, or they may inexplicably disregard available documents, relying solely on their parametric memory.

Moreover, implementing RAG can be resource-intensive; retrieved documents, regardless of their source, must be stored temporarily in memory for model reference. This demand for memory capacity, coupled with the additional computational power required for processing expanded contexts, significantly raises operational costs. Given the existing criticisms regarding the high computational and energy demands of AI technologies, this is a pressing concern.

However, there’s potential for advancing RAG technology. Wadden highlighted ongoing research aimed at enhancing AI models’ interactions with retrieved documents. Some initiatives focus on enabling models to determine when retrieval is necessary, along with improving the indexing of extensive document datasets and refining search capabilities using more sophisticated document representations beyond mere keywords.

“As we excel at keyword-based document retrieval, we still have much to learn about effectively finding documents related to abstract concepts, such as the proof techniques needed for a math problem,” Wadden explained. “Thus, research into developing document representations and search methodologies that identify relevant documents for complex generation tasks remains largely open-ended.”

In summary, while RAG can significantly mitigate a model's hallucinations, it isn't a catch-all solution for all AI-related inaccuracies. Be cautious of any vendor claiming otherwise.

Most people like

Find AI tools in YBX