Microsoft Introduces Tool to Address AI Hallucinations, But Experts Urge Caution

Microsoft's New AI Correction Service: Is It The Answer To Hallucinations?

AI is often criticized for its inaccuracies, but Microsoft claims to have a solution. The introduction of Correction, a new service designed to automatically amend factually incorrect AI-generated text, may raise some skepticism.

Correction identifies text that may contain errors—like a misquoted earnings call summary—by flagging potentially erroneous information. It then fact-checks by comparing the flagged text against verified sources, such as uploaded transcripts.

Currently in preview, Correction is integrated into Microsoft’s Azure AI Content Safety API and can be utilized with various text-generating AI models, including Meta's Llama and OpenAI's GPT-4. “Correction employs a novel method of leveraging small and large language models to align outputs with reliable documents,” a Microsoft spokesperson shared. “Our goal is to help developers and users in critical areas like healthcare, where the accuracy of AI responses is paramount.”

This summer, Google launched a similar functionality in its Vertex AI platform, allowing users to validate models using data from various sources, including their own datasets and Google Search results.

However, experts caution that while these grounding techniques may seem promising, they don't address the fundamental issues causing AI hallucinations. "Attempting to eliminate hallucinations from generative AI is akin to trying to extract hydrogen from water," remarked Os Keyes, a PhD candidate at the University of Washington researching the ethical implications of emerging technologies. "It's an intrinsic feature of how these systems function."

Text-generating models "hallucinate" because they lack true understanding. They are statistical systems that recognize patterns in language and predict what comes next based on vast training datasets. As a result, their responses are not factual answers but predictions based on trends observed in their training data. One study has even shown that OpenAI’s ChatGPT provides incorrect answers to medical questions half the time.

To counter hallucinations, Microsoft has developed two complementary models that work together like a pair of editorial assistants. The first, a classifier model, identifies potentially false or irrelevant sections of AI-generated text. If it detects inaccuracies, it activates a second language model, which attempts to rectify these issues using established “grounding documents.”

“Correction can greatly improve the trustworthiness of AI-generated content, reducing user dissatisfaction and mitigating reputational risks for developers,” the Microsoft spokesperson stated. “However, it's crucial to understand that grounding detection does not guarantee ‘accuracy’; it simply aligns generative AI outputs with reference documents.”

Keyes remains skeptical. “While it might address some issues, it could also introduce new ones. After all, the hallucination detection system could itself misinterpret information."

When asked for details about the Correction models’ backgrounds, the Microsoft spokesperson referenced a recent research paper explaining their architecture. However, the paper lacks important information on the datasets used in training the models.

Mike Cook, an AI research fellow at Queen Mary University, believes that even if Correction performs as intended, it could exacerbate existing trust and transparency challenges surrounding AI technologies. Although the service might catch certain errors, it may also mislead users into thinking that AI models are more accurate than they truly are. "Tech companies like Microsoft, OpenAI, and Google have fostered a reliance on models in contexts where they frequently err," Cook pointed out. "What Microsoft is doing now is repeating past mistakes at a larger scale. If this enhances accuracy from 90% to 99%, the real concern has always been that elusive 1% of errors we haven't yet addressed."

Additionally, there's a potential business angle to Microsoft’s implementation of Correction. While the feature itself is provided for free, the necessary “groundedness detection” for diagnosing hallucinations is only free for up to 5,000 text records per month. After that, it incurs a cost of 38 cents for every additional 1,000 records.

Under pressure to justify its AI investments to customers and shareholders, Microsoft has invested nearly $19 billion in capital expenditures related to AI just in Q2. Yet, significant revenue from AI remains elusive; concerns over its long-term strategy have led to recent stock downgrades by Wall Street analysts. Reports indicate that many early adopters have paused their use of Microsoft’s flagship generative AI platform, Microsoft 365 Copilot, due to performance and cost issues. In one case, the AI inaccurately generated attendees for a Microsoft Teams meeting, suggesting discussions that never took place.

The accuracy and potential for hallucinations are now key concerns for businesses exploring AI tools, according to KPMG's recent survey. "If this were a typical product development cycle, generative AI would still be in the R&D phase, focusing on improvement and understanding its capabilities," Cook remarked. "Instead, it has been rushed into multiple industries without thorough vetting. Microsoft and others have loaded everyone onto their ambitious new venture, constructing support systems while en route."

Most people like

Find AI tools in YBX