Ensuring Safe Deployment of AI Models in Business
For businesses aiming to implement AI models in their operations—whether for employees or customers—the most pressing question isn't merely what model to choose or its intended use. Instead, it revolves around determining when the selected model is safe for deployment.
How much backend testing is required? What types of tests should be conducted? Companies understandably wish to avoid mishaps like those seen with some car dealerships using ChatGPT for customer support, where users tricked the system into agreeing to sell cars for $1.
The Importance of Thorough Testing
Properly testing AI models, especially finely-tuned versions, can be the difference between a successful launch and one that jeopardizes a company’s reputation and finances. Kolena, a San Francisco-based startup co-founded by a former Amazon senior engineering manager, recently announced the release of its AI Quality Platform. This web application is designed to facilitate rapid and accurate testing and validation of AI systems.
The platform encompasses various functions, including data quality monitoring, model testing, A/B testing, and monitoring for data drift and model degradation over time. It also includes debugging capabilities.
“Solving this problem is essential to advancing AI adoption in enterprises,” remarked Mohamed Elgendy, Kolena’s co-founder and CEO, during an exclusive media interview. Elgendy brings valuable experience from his past roles as VP of Engineering at Rakuten and a senior engineering manager at Amazon, giving him insight into the challenges enterprises face with AI deployment.
How Kolena’s AI Quality Platform Works
Kolena's solution aims to assist software developers and IT personnel in creating safe, reliable, and equitable AI systems for real-world applications. By enabling rapid development of detailed test cases from datasets, the platform allows for rigorous examination of AI/ML models in realistic scenarios, moving beyond broad statistical metrics that may obscure critical performance insights.
Each Kolena customer connects their chosen model via API and supplies their own dataset along with functional requirements for model operations—be it text, imagery, code, audio, or other content. Customers can also assess attributes like bias and the diversity of age, race, and ethnicity among various metrics. Kolena conducts tests simulating hundreds or thousands of interactions to identify any undesirable outcomes, including how frequently they occur and under what circumstances.
Furthermore, Kolena re-tests models following updates, retraining, or adjustments made by either providers or customers.
“Elgendy explains: “It will run tests and pinpoint exactly where your model has degraded. Kolena turns testing into a precise engineering discipline, similar to software development.”
This capability is not only valuable for enterprises but also for AI model providers. For instance, Elgendy noted that Google’s Gemini, which faced scrutiny for generating inaccurate images, could have benefited from the rigorous testing provided by Kolena's platform prior to its release.
Extensive Testing Before Launch
In line with its ambitions, Kolena ensures extensive testing of its AI Quality Platform before a broader release. The company has been conducting closed beta testing with Fortune 500 companies and startups over the past 24 months, refining its platform based on user feedback and needs.
“We worked closely with a select group of customers to define both known and unknown challenges,” Elgendy explained. This group has collectively executed "tens of thousands" of tests on AI models using Kolena’s platform.
Looking ahead, Kolena seeks to engage customers across three key areas: 1. Builders of AI foundation models, 2. Buyers within the tech sector, and 3. Buyers from non-tech industries. For example, one partner is utilizing a large language model solution to enhance fast food drive-through operations, while another targets autonomous vehicle developers.
Pricing and Accessibility
Kolena’s AI Quality Platform operates on a software-as-a-service (SaaS) model, featuring three pricing tiers that scale with a company's AI growth, from initial data quality assessments to model training and eventual deployment.