The AI industry is increasingly shifting towards generative AI models with extended context capabilities. However, models that support larger context windows are often computationally demanding. Or Dagan, the product lead at AI startup AI21 Labs, believes that this challenge can be overcome, and his company is set to demonstrate this with the launch of their innovative generative model.
In the realm of AI, contexts, or context windows, refer to the input information (like text) that a model considers before generating output (additional text). Models with shorter context windows often struggle to retain details from even recent interactions. In contrast, those with longer contexts can maintain a richer understanding of ongoing conversations, effectively enhancing the coherence of the output generated.
AI21 Labs has introduced Jamba, a powerful text generation and analysis model that offers capabilities similar to those of OpenAI's ChatGPT and Google's Gemini. Trained using a blend of public and proprietary datasets, Jamba is proficient in crafting text in English, French, Spanish, and Portuguese.
Notably, Jamba can manage up to 140,000 tokens while operating on a single GPU equipped with at least 80GB of memory, such as a high-performance Nvidia A100. This capacity equates to approximately 105,000 words, or about 210 pages, which is akin to a substantial novel.
For comparison, Meta’s Llama 2 features a context window of roughly 4,000 tokens, which is relatively modest by contemporary standards but only demands around 12GB of GPU memory for operation. Context windows are typically quantified in tokens, representing basic units of text and data.
At first glance, Jamba may not seem particularly remarkable given the plethora of available generative AI models, including Databricks’ DBRX and Llama 2. However, the true distinction of Jamba lies in its underlying architecture, which combines two innovative structures: transformers and state space models (SSMs).
Transformers, favored for their effectiveness in complex reasoning tasks, drive advanced models like GPT-4 and Google’s Gemini. The standout feature of transformers is their “attention mechanism,” which evaluates the importance of each input data piece (such as a sentence) in relation to one another, enabling coherent output generation.
Conversely, SSMs incorporate beneficial elements from older AI models, such as recurrent neural networks and convolutional neural networks, resulting in a more efficient architecture suited for processing longer data sequences.
While SSMs have some limitations, early examples, including the open-source Mamba model developed by researchers from Princeton and Carnegie Mellon, can handle larger inputs than their transformer-based counterparts while excelling in language generation.
Jamba integrates Mamba as a foundational component, and Dagan claims it achieves three times the throughput for long contexts compared to similarly sized transformer models.
“This release marks the first commercial-grade, production-scale model built on this innovative architecture,” Dagan stated in an interview. “Beyond its research implications, this architecture offers remarkable efficiency and throughput potential.”
Though Jamba is available under the Apache 2.0 license—an open-source license with minimal restrictions—Dagan emphasizes that it is primarily a research release, not yet fit for commercial use. The model currently lacks safeguards against generating harmful content or managing biases, but a refined, safer version is expected within weeks.
Dagan remains confident in Jamba's potential, highlighting its combination of substantial size and groundbreaking architecture, making it adaptable for a single GPU setup. “We anticipate performance will enhance as Mamba receives further optimization,” he added.
In summary, AI21 Labs’ Jamba represents a significant stride in generative AI technology, showcasing the potential of SSM models for better performance and efficiency while inviting further exploration in the AI research community.