Enterprises are optimistic about generative AI, investing billions to develop applications ranging from chatbots to search tools for various use cases. While nearly every major company has a generative AI initiative in progress, there is a critical distinction between committing to AI and successfully deploying it in production.
Today, California-based startup Maxim, founded by former Google and Postman executives Vaibhavi Gangwar and Akshay Deo, introduced an end-to-end evaluation and observation platform designed to address this gap. The company also announced $3 million in funding from Elevation Capital and other angel investors.
Maxim tackles a significant challenge faced by developers in building large language model (LLM)-powered AI applications: monitoring the various components throughout the development lifecycle. Even minor errors can undermine project reliability and trust, leading to delays in delivery. Maxim's platform focuses on testing and improving AI quality and safety both before release and after production, establishing a standard that helps organizations streamline their AI application lifecycle and quickly deliver high-quality products.
Challenges in Developing Generative AI Applications
Historically, software development followed a deterministic approach with standardized practices for testing and iteration, allowing teams clear pathways to enhance quality and security. However, the introduction of generative AI has introduced numerous variables, resulting in a non-deterministic paradigm. Developers must manage various elements, from the model used to data and user question framing, while ensuring quality, safety, and performance.
Organizations generally respond to these evaluation challenges in two main ways: hiring talent to oversee every variable or developing internal tools, both of which can lead to increased costs and divert attention from core business functions.
Recognizing this need, Gangwar and Deo launched Maxim to bridge the gap between the model and application layers of the generative AI stack. The platform provides comprehensive evaluation throughout the AI development lifecycle, from prompt engineering and pre-release testing to post-release monitoring and optimization.
Gangwar describes Maxim's platform as comprising four core components: an experimentation suite, an evaluation toolkit, observability, and a data engine.
The experimentation suite includes a prompt CMS, IDE, visual workflow builder, and connectors to external data sources, enabling teams to iterate on prompts, models, and parameters effectively. For instance, teams can experiment with different prompts on various models for a customer service chatbot.
The evaluation toolkit offers a unified framework for both AI-driven and human evaluations, allowing teams to quantitatively assess improvements or regressions through comprehensive testing. Results are visualized in dashboards that cover metrics such as tone, accuracy, toxicity, and relevance.
Observability is key in the post-release phase, enabling real-time monitoring of production logs and automated evaluations to identify and resolve live issues, ensuring quality standards are met.
According to Gangwar, “Users can establish automated controls for various quality, safety, and security signals on production logs. They can also set real-time alerts for regressions in metrics that matter most, such as performance, cost, and quality.”
Using insights from the observability suite, users can swiftly address issues. If data quality is the concern, the data engine allows for seamless curation and enrichment of datasets for fine-tuning.
Accelerated Application Deployments
Though still in its early stages, Maxim claims to have assisted "a few dozen" early partners in testing, iterating, and deploying their AI products at a rate five times faster than before, targeting sectors like B2B tech, generative AI services, BFSI, and Edtech — industries where evaluation challenges are particularly acute. As the company expands its operations, it plans to enhance platform capabilities, focusing on mid-market and enterprise clients.
Maxim's platform also includes enterprise-centric features such as role-based access controls, compliance, team collaboration, and deployment options in a virtual private cloud.
While Maxim's approach to standardized testing and evaluation is noteworthy, it faces challenges competing with well-funded rivals like Dynatrace and Datadog, which continually evolve their offerings.
Gangwar remarks that many competitors either focus on performance monitoring, quality, or observability, whereas Maxim aims to consolidate all evaluation needs in a single, integrated platform.
"The development lifecycle requires holistic management of testing-related needs, which we believe will drive significant productivity and quality improvements for sustainable applications," she asserts.
Looking ahead, Maxim intends to expand its team and operational capabilities while forging more partnerships with enterprises focused on AI product development. Future enhancements may include proprietary domain-specific evaluations for quality and security and the development of a multi-modal data engine.