As enterprise organizations pursue the agentic future, the architecture of AI models poses a significant challenge. Ori Goshen, CEO of AI21, emphasizes the need for alternative model architectures to create more efficient AI agents, as the prevailing Transformer models present limitations that hinder the establishment of a multi-agent ecosystem.
In a recent interview, Goshen highlighted the drawbacks of Transformer architecture: its computational intensity increases with longer context handling, slowing down performance and escalating costs. "Agents require multiple calls to LLMs with extensive context at each step, making Transformer a bottleneck," he noted.
AI21 advocates for a more flexible approach to model architecture, proposing that while Transformers can be a viable option, they shouldn't be the default. The company's JAMBA architecture—short for Joint Attention and Mamba—leverages the Mamba framework developed by researchers at Princeton and Carnegie Mellon to enhance inference speeds and extend context capabilities.
Goshen explains that Mamba-based models improve memory performance, facilitating better functionality for agents, especially those integrating with other models. The recent surge in the popularity of AI agents can largely be attributed to the limitations of LLMs built with Transformers.
"The primary reason agents remain in development—and have not yet seen widespread production—is reliability. Since LLMs are inherently stochastic, additional measures must be implemented to ensure the necessary reliability," Goshen stated.
AI agents have surfaced as a leading trend in enterprise AI this year, with several companies launching new platforms for agent development. For instance, ServiceNow upgraded its Now Assist AI platform to include a library of AI agents, while Salesforce introduced its Agentforce. Meanwhile, Slack is enabling users to integrate agents from various companies, including Salesforce, Cohere, and Adobe.
Goshen believes that with the right mix of models and architectures, interest in AI agents will escalate. "Current use cases, like chatbot question-and-answer functions, mainly resemble enhanced search. True intelligence lies in the ability to connect and retrieve diverse information from multiple sources," he commented. AI21 is actively developing its offerings around AI agents to meet this demand.
As Mamba architecture gains traction, Goshen remains a vocal supporter, asserting that the cost and complexity of Transformers diminish their practical applications. Unlike Transformers, which rely on a fixed attention mechanism, Mamba focuses on optimizing memory usage and utilizing GPU processing power effectively.
The demand for Mamba is on the rise, with other developers releasing Mamba-based models, such as Mistral's Codestral Mamba 7B and Falcon's Falcon Mamba 7B. Nevertheless, Transformers continue to dominate as the standard choice for foundation models, including OpenAI’s successful GPT.
Ultimately, Goshen notes that enterprises prioritize reliability over any particular architecture. However, organizations should remain cautious of tempting demos that promise extensive solutions. "We're in a phase where captivating demonstrations are prevalent, but we are still transitioning towards an applicable product phase," he cautioned. "While enterprise AI is valuable for research, it is not yet ready to inform critical business decisions."