In the era of large language models (LLMs), companies are eager to implement the most effective model for their unique applications. While this task may seem straightforward, many organizations encounter a significant challenge: identifying the best solution for their specific use cases in a rapidly evolving landscape.
Enter Not Diamond, a groundbreaking startup launching from stealth mode, which posits that the key lies in smart routing.
Not Diamond, based in San Francisco, has developed an innovative LLM router that enables enterprises to utilize multiple models simultaneously, directing queries to the most suitable one. This approach enhances output quality while optimizing crucial factors like latency and costs.
“Our core belief is that the future won’t consist of a single dominant model or company—rather, there will be numerous foundation models, countless specialized variants, and a multitude of custom inference engines operating above them. We founded Not Diamond to facilitate this multi-model future, offering the world’s most advanced infrastructure for routing between models,” says Tomás Hernando Kofman, co-founder and CEO of Not Diamond.
Despite its early stage, Not Diamond has attracted significant attention, securing $2.3 million in funding from defy.vc and notable figures in the AI community, including Jeff Dean from Google DeepMind, Julien Chaumond from Hugging Face, Zack Kass from OpenAI, and others.
The LLM Cost versus Task-Specific Performance Challenge
Navigating the current landscape of large language models is complex, as each model—whether open-source or proprietary—has its strengths and weaknesses. Selecting a model with extensive context length and high performance can often be prohibitively expensive. Conversely, more affordable options may lack critical capabilities or exhibit high latency.
Adding to the challenge, new models emerge daily, while existing ones continue to receive substantial updates, demonstrating the potential of open-source advancements, like Llama 3.1.
How Not Diamond Empowers Enterprises
Kofman, who previously developed a no-code AI product, faced the LLM dilemma firsthand. He envisioned a solution: an interface enabling enterprises to access a network of specialized models instead of relying on a single option. This vision led him to collaborate with machine learning experts Tze-Yang Tung and Jeffrey Akiki to establish Not Diamond, focused on creating infrastructure that intelligently routes queries among models.
“Effective routing infrastructure is vital for maximizing AI system performance. Smaller, specialized models can outperform larger ones in specific domains, and routing provides these models with the resilience of general ones. This approach is not only computation-efficient, but it also enhances interpretability and safety,” Kofman explained.
Not Diamond’s Innovative Technology
At the heart of Not Diamond's solution is a 'meta-model' and an LLM ranking algorithm. This router comprehensively analyzes incoming queries, automatically directing them to the model best equipped to provide accurate responses while maximizing cost-efficiency and minimizing latency. As a result, teams are spared the need to call on large models for straightforward queries.
Benchmark results indicate that Not Diamond's router, utilizing multiple LLMs, surpasses individual models such as Llama 3.1 and GPT-4, delivering superior results.
To develop this capability, Not Diamond created a substantial evaluation dataset to measure LLM performance across various tasks, from answering questions to coding and reasoning. The company then trained a ranking algorithm to identify the most compatible LLM for each query, driving the routing process.
In December 2023, Not Diamond released an open-source preview of its router, allowing enterprises to seamlessly manage queries between GPT-3.5 and GPT-4, with plans to expand to additional models.
Moreover, if a team wishes to integrate the router into their internal workflows for specific applications, they can provide internal evaluation datasets to train a custom router, optimizing model selection. The router also offers data hashing and prompt translation features to enhance performance.
Accelerating Developer Adoption
Although still in its infancy, Not Diamond is experiencing significant uptake from early-stage companies and independent developers. While specific user counts remain undisclosed, one enterprise customer, Samwell AI, reported a 10% improvement in LLM output quality alongside a 10% reduction in inference costs and latency through the use of Not Diamond’s technology.
With backing from industry leaders, the company aims to build on its progress, accelerating product development and increasing adoption rates. Kofman emphasizes that Not Diamond has a “host of additional product features” in development, although specifics remain under wraps.
In the realm of smart query routing, Not Diamond faces competition from several noteworthy startups, including Martian and Unify. However, Kofman asserts that Not Diamond stands apart due to its exceptional routing speed, prompt optimization, and privacy features.