Patronus AI Raises $17M to Combat AI Hallucinations and Copyright Issues, Boosting Enterprise Adoption

As companies rush to adopt generative AI, concerns over the accuracy and safety of large language models (LLMs) threaten to hinder widespread enterprise integration. Addressing these challenges is Patronus AI, a San Francisco startup that recently secured $17 million in Series A funding to automatically identify costly and potentially dangerous LLM errors at scale.

This funding round, which raises Patronus AI’s total investment to $20 million, was spearheaded by Glenn Solomon at Notable Capital, with contributions from Lightspeed Venture Partners, former DoorDash executive Gokul Rajaram, Factorial Capital, Datadog, and several undisclosed tech leaders.

Founded by ex-Meta machine learning experts Anand Kannappan and Rebecca Qian, Patronus AI has created an innovative automated evaluation platform designed to detect issues such as hallucinations, copyright violations, and safety risks in LLM outputs. Utilizing proprietary AI, the platform evaluates model performance, stress-tests them with adversarial examples, and facilitates detailed benchmarking—all without the manual labor typically required by enterprises.

"Our product excels at catching a variety of mistakes," said Kannappan, CEO of Patronus AI. "This includes hallucinations, copyright issues, safety-related risks, and tailored capabilities for maintaining a brand's style and tone."

The advent of powerful LLMs like OpenAI’s GPT-4 and Meta’s Llama 3 has sparked a competitive race in Silicon Valley to harness this technology's generative capabilities. However, with the excitement have come notable model failures—ranging from error-laden AI-generated articles by CNET to drug discovery firms retracting research papers influenced by LLM inaccuracies.

These failures highlight deeper, systemic issues within current LLMs, which Patronus AI is keen to address. Their research, including the recently launched "CopyrightCatcher" API and "FinanceBench" benchmark, reveals alarming shortcomings in leading models’ abilities to provide accurate, fact-based answers.

In the "FinanceBench" benchmark, Patronus evaluated models like GPT-4 on financial queries using public SEC filings. The results were stark: the top-performing model answered only 19% of questions correctly despite reviewing an entire annual report. A separate evaluation using the "CopyrightCatcher" API discovered that open-source LLMs reproduced copyrighted text verbatim in 44% of cases.

"Even state-of-the-art models struggle with accuracy, performing at only 90% in finance contexts," noted Qian, CTO of Patronus. "Our findings show that open-source models yield over 20% unsafe responses in high-risk areas. Copyright infringement poses a substantial concern; large publishers and media companies must be vigilant."

While other startups like Credo AI and Weights & Biases are developing LLM evaluation tools, Patronus distinguishes itself with a research-first approach. Their core technology involves training dedicated evaluation models to identify specific scenarios where LLMs may fail.

“No other company matches our depth of research and technology," Kannappan asserted. “Our strategy is unique—rooted in training evaluation models, pioneering alignment techniques, and publishing research."

Patronus AI has gained traction with several Fortune 500 companies across industries including automotive, education, finance, and software, helping them implement LLMs safely. With the infusion of new capital, Patronus plans to expand its research, engineering, and sales teams while developing additional benchmarks.

If Patronus realizes its vision, automated LLM evaluations could become essential for enterprises, parallel to the role of security audits in accelerating cloud adoption. Qian envisions a future where testing models with Patronus is routine, akin to unit testing for code.

"Our platform is versatile, applicable across various domains, from legal to healthcare," she explained. "We aim to empower enterprises in every industry to harness LLMs while ensuring compliance with their specific requirements."

Despite the complexities inherent in validating LLM performance due to their black-box nature and vast output possibilities, Patronus is committed to advancing AI evaluation. By pushing the boundaries of automated testing, they aim to facilitate the accountable deployment of LLMs in real-world applications.

"Automating LLM performance measurement is challenging due to the diverse range of behaviors these generative models can exhibit," acknowledged Kannappan. "However, our research-driven methodology enables us to reliably and scalably identify errors that manual testing simply cannot."

Most people like

Find AI tools in YBX