Patronus AI Raises $17M to Combat AI Hallucinations and Copyright Issues, Boosting Enterprise Adoption

Home AI News Patronus AI Raises $17M to Combat AI Hallucinations and Copyright Issues, Boosting Enterprise Adoption

As companies rush to adopt generative AI, concerns over the accuracy and safety of large language models (LLMs) threaten to hinder widespread enterprise integration. Addressing these challenges is Patronus AI, a San Francisco startup that recently secured $17 million in Series A funding to automatically identify costly and potentially dangerous LLM errors at scale.

This funding round, which raises Patronus AI’s total investment to $20 million, was spearheaded by Glenn Solomon at Notable Capital, with contributions from Lightspeed Venture Partners, former DoorDash executive Gokul Rajaram, Factorial Capital, Datadog, and several undisclosed tech leaders.

Founded by ex-Meta machine learning experts Anand Kannappan and Rebecca Qian, Patronus AI has created an innovative automated evaluation platform designed to detect issues such as hallucinations, copyright violations, and safety risks in LLM outputs. Utilizing proprietary AI, the platform evaluates model performance, stress-tests them with adversarial examples, and facilitates detailed benchmarking—all without the manual labor typically required by enterprises.

"Our product excels at catching a variety of mistakes," said Kannappan, CEO of Patronus AI. "This includes hallucinations, copyright issues, safety-related risks, and tailored capabilities for maintaining a brand's style and tone."

The advent of powerful LLMs like OpenAI’s GPT-4 and Meta’s Llama 3 has sparked a competitive race in Silicon Valley to harness this technology's generative capabilities. However, with the excitement have come notable model failures—ranging from error-laden AI-generated articles by CNET to drug discovery firms retracting research papers influenced by LLM inaccuracies.

These failures highlight deeper, systemic issues within current LLMs, which Patronus AI is keen to address. Their research, including the recently launched "CopyrightCatcher" API and "FinanceBench" benchmark, reveals alarming shortcomings in leading models’ abilities to provide accurate, fact-based answers.

In the "FinanceBench" benchmark, Patronus evaluated models like GPT-4 on financial queries using public SEC filings. The results were stark: the top-performing model answered only 19% of questions correctly despite reviewing an entire annual report. A separate evaluation using the "CopyrightCatcher" API discovered that open-source LLMs reproduced copyrighted text verbatim in 44% of cases.

"Even state-of-the-art models struggle with accuracy, performing at only 90% in finance contexts," noted Qian, CTO of Patronus. "Our findings show that open-source models yield over 20% unsafe responses in high-risk areas. Copyright infringement poses a substantial concern; large publishers and media companies must be vigilant."

While other startups like Credo AI and Weights & Biases are developing LLM evaluation tools, Patronus distinguishes itself with a research-first approach. Their core technology involves training dedicated evaluation models to identify specific scenarios where LLMs may fail.

“No other company matches our depth of research and technology," Kannappan asserted. “Our strategy is unique—rooted in training evaluation models, pioneering alignment techniques, and publishing research."

Patronus AI has gained traction with several Fortune 500 companies across industries including automotive, education, finance, and software, helping them implement LLMs safely. With the infusion of new capital, Patronus plans to expand its research, engineering, and sales teams while developing additional benchmarks.

If Patronus realizes its vision, automated LLM evaluations could become essential for enterprises, parallel to the role of security audits in accelerating cloud adoption. Qian envisions a future where testing models with Patronus is routine, akin to unit testing for code.

"Our platform is versatile, applicable across various domains, from legal to healthcare," she explained. "We aim to empower enterprises in every industry to harness LLMs while ensuring compliance with their specific requirements."

Despite the complexities inherent in validating LLM performance due to their black-box nature and vast output possibilities, Patronus is committed to advancing AI evaluation. By pushing the boundaries of automated testing, they aim to facilitate the accountable deployment of LLMs in real-world applications.

"Automating LLM performance measurement is challenging due to the diverse range of behaviors these generative models can exhibit," acknowledged Kannappan. "However, our research-driven methodology enables us to reliably and scalably identify errors that manual testing simply cannot."

Struggling with SaaS Clutter? Discover How UnifyApps Leverages AI to Connect Your Disjointed Applications Seamlessly

Intently Secures $3M to Launch Innovative AI Networking Tool for Startup Founders

Most people like

Super Teacher

5.6K

Super Teacher provides unlimited private lessons in a wide range of subjects for children aged 3-8, delivering superior results compared to traditional private tutoring.

education AI Education Assistant

Avaturn

127.4K

Easily create realistic avatars with Avaturn—simply upload a selfie! Perfect for enhancing games, apps, or your metaverse experience, these lifelike avatars are fully exportable for all your digital needs.

avatar AI Avatar Generator

MindPal

76.3K

MindPal: An AI-driven platform designed for professionals seeking to boost productivity through a wide array of innovative features. Discover how MindPal can transform your workflow and elevate your performance.

AI AI Chatbot

Shmooz AI

29.2K

Shmooz AI is an innovative AI assistant designed to streamline your daily tasks using cutting-edge chat GPT technology. Experience enhanced productivity and efficiency with Shmooz AI by transforming the way you manage your day-to-day activities.

AI assistant AI Content Generator

Find AI tools in YBX