Microsoft Introduces Tool to Address AI Hallucinations, But Experts Urge Caution

Home AI News Microsoft Introduces Tool to Address AI Hallucinations, But Experts Urge Caution

Updated on October 20 2024

Microsoft's New AI Correction Service: Is It The Answer To Hallucinations?

AI is often criticized for its inaccuracies, but Microsoft claims to have a solution. The introduction of Correction, a new service designed to automatically amend factually incorrect AI-generated text, may raise some skepticism.

Correction identifies text that may contain errors—like a misquoted earnings call summary—by flagging potentially erroneous information. It then fact-checks by comparing the flagged text against verified sources, such as uploaded transcripts.

Currently in preview, Correction is integrated into Microsoft’s Azure AI Content Safety API and can be utilized with various text-generating AI models, including Meta's Llama and OpenAI's GPT-4. “Correction employs a novel method of leveraging small and large language models to align outputs with reliable documents,” a Microsoft spokesperson shared. “Our goal is to help developers and users in critical areas like healthcare, where the accuracy of AI responses is paramount.”

This summer, Google launched a similar functionality in its Vertex AI platform, allowing users to validate models using data from various sources, including their own datasets and Google Search results.

However, experts caution that while these grounding techniques may seem promising, they don't address the fundamental issues causing AI hallucinations. "Attempting to eliminate hallucinations from generative AI is akin to trying to extract hydrogen from water," remarked Os Keyes, a PhD candidate at the University of Washington researching the ethical implications of emerging technologies. "It's an intrinsic feature of how these systems function."

Text-generating models "hallucinate" because they lack true understanding. They are statistical systems that recognize patterns in language and predict what comes next based on vast training datasets. As a result, their responses are not factual answers but predictions based on trends observed in their training data. One study has even shown that OpenAI’s ChatGPT provides incorrect answers to medical questions half the time.

To counter hallucinations, Microsoft has developed two complementary models that work together like a pair of editorial assistants. The first, a classifier model, identifies potentially false or irrelevant sections of AI-generated text. If it detects inaccuracies, it activates a second language model, which attempts to rectify these issues using established “grounding documents.”

“Correction can greatly improve the trustworthiness of AI-generated content, reducing user dissatisfaction and mitigating reputational risks for developers,” the Microsoft spokesperson stated. “However, it's crucial to understand that grounding detection does not guarantee ‘accuracy’; it simply aligns generative AI outputs with reference documents.”

Keyes remains skeptical. “While it might address some issues, it could also introduce new ones. After all, the hallucination detection system could itself misinterpret information."

When asked for details about the Correction models’ backgrounds, the Microsoft spokesperson referenced a recent research paper explaining their architecture. However, the paper lacks important information on the datasets used in training the models.

Mike Cook, an AI research fellow at Queen Mary University, believes that even if Correction performs as intended, it could exacerbate existing trust and transparency challenges surrounding AI technologies. Although the service might catch certain errors, it may also mislead users into thinking that AI models are more accurate than they truly are. "Tech companies like Microsoft, OpenAI, and Google have fostered a reliance on models in contexts where they frequently err," Cook pointed out. "What Microsoft is doing now is repeating past mistakes at a larger scale. If this enhances accuracy from 90% to 99%, the real concern has always been that elusive 1% of errors we haven't yet addressed."

Additionally, there's a potential business angle to Microsoft’s implementation of Correction. While the feature itself is provided for free, the necessary “groundedness detection” for diagnosing hallucinations is only free for up to 5,000 text records per month. After that, it incurs a cost of 38 cents for every additional 1,000 records.

Under pressure to justify its AI investments to customers and shareholders, Microsoft has invested nearly $19 billion in capital expenditures related to AI just in Q2. Yet, significant revenue from AI remains elusive; concerns over its long-term strategy have led to recent stock downgrades by Wall Street analysts. Reports indicate that many early adopters have paused their use of Microsoft’s flagship generative AI platform, Microsoft 365 Copilot, due to performance and cost issues. In one case, the AI inaccurately generated attendees for a Microsoft Teams meeting, suggesting discussions that never took place.

The accuracy and potential for hallucinations are now key concerns for businesses exploring AI tools, according to KPMG's recent survey. "If this were a typical product development cycle, generative AI would still be in the R&D phase, focusing on improvement and understanding its capabilities," Cook remarked. "Instead, it has been rushed into multiple industries without thorough vetting. Microsoft and others have loaded everyone onto their ambitious new venture, constructing support systems while en route."

OpenAI Unveils Advanced Voice Mode: Enhanced Voices and a Fresh New Design

Learn from the Best: How Workera's CEO, Mentored by Andrew Ng, Aims to Create an AI Agent for Your Personal Growth

Most people like

AgentX

In today’s competitive market, harnessing the capabilities of a dependable AI agent for lead generation is essential for businesses seeking growth. With advanced algorithms and data-driven insights, a reliable AI agent streamlines the prospecting process, ensuring you connect with potential customers efficiently. Uncover how integrating this technology can transform your lead generation strategy and drive your sales success.

AI Agent AI Chatbot

Astria

Astria leverages the power of generative AI to create unique, high-quality images through the Dreambooth API. This innovative technology allows for personalized visual content tailored to your specific needs.

Generative AI AI Photo & Image Generator

SwapMyFace

Discover the ultimate AI face swap tool designed for transforming any image effortlessly. This innovative technology allows users to swap faces with ease, resulting in realistic and seamless edits. Whether you're creating fun memes, enhancing your social media posts, or experimenting with creative projects, our user-friendly tool simplifies the process and delivers impressive results. Dive in and experience the magic of AI face swapping today!

AI face swap AI Photo & Image Generator

Trellus

Introducing Trellus, an innovative AI-driven tool designed to provide personalized coaching and real-time analytics specifically for sales reps during their cold calls. With Trellus, enhance your cold calling strategy and drive sales success through tailored insights and guidance.

AI sales coaching Sales Assistant

Find AI tools in YBX