AI21 Labs Powers Up Generative AI Transformers with Jamba Collaboration

Ever since the groundbreaking research paper "Attention is All You Need" was published in 2017, transformers have taken center stage in the generative AI landscape. However, transformers are not the only viable approach to generative AI. AI21 Labs has introduced a novel framework called “Jamba,” which seeks to advance beyond traditional transformers.

Jamba merges the Mamba model, based on the Structured State Space model (SSM), with transformer architecture to create an optimized generative AI solution. The term "Jamba" stands for Joint Attention and Mamba Architecture, designed to harness the strengths of both SSM and transformers. This model is released as open-source under the Apache 2.0 license.

While Jamba is not poised to replace existing transformer-based large language models (LLMs), it is expected to serve as a valuable supplement in specific applications. AI21 Labs states that Jamba can outperform traditional transformer models in generative reasoning tasks, as evidenced by benchmarks like HellaSwag. However, it does not yet surpass transformer models on critical benchmarks such as the Massive Multitask Language Understanding (MMLU), which assesses problem-solving capabilities.

AI21 Labs specializes in generative AI for enterprise applications, recently raising $155 million in August 2023 to further its initiatives. Among its enterprise offerings is Wordtune, a tool designed to help organizations generate content that aligns with their tone and branding. The company reported in 2023 that it has successfully competed against generative AI giant OpenAI in securing enterprise clients.

Traditionally, AI21 Labs' LLM technology has utilized transformer architecture, including its Jurassic-2 LLM family, which is part of the AI21 Studio natural language processing (NLP) platform and available via APIs for enterprise integration. However, Jamba represents a shift toward a hybrid SSM and transformer model.

Despite transformers' prominent role in generative AI, they have certain limitations. A significant issue is that as context windows expand, inference tends to slow down. As AI21 Labs researchers explain, a transformer's attention mechanism scales with the sequence length, leading to reduced throughput since each token relies on the preceding entire sequence. This makes long context applications inefficient.

Another challenge involves the substantial memory footprint needed for scaling transformers. Their memory requirements grow with context length, complicating the processing of long contexts or multiple parallel tasks without considerable hardware resources. The SSM approach aims to address these context and memory concerns.

The Mamba SSM architecture, originally developed by researchers at Carnegie Mellon and Princeton universities, is designed to require less memory and utilize a different attention mechanism for managing large context windows. However, it struggles to achieve the same output quality as transformer models. The Jamba hybrid approach combines the resource and context optimization of SSM with the output capabilities of transformers.

AI21 Labs claims that the Jamba model features a 256K context window and offers three times the throughput on long contexts compared to Mixtral 8x7B. Notably, Jamba is positioned as the only model in its size class capable of fitting up to 140K context on a single GPU.

Similar to Mixtral, Jamba incorporates a Mixture of Experts (MoE) model. However, Jamba utilizes MoE within its hybrid SSM transformer framework, enabling higher optimization levels. Specifically, Jamba's MoE layers activate only 12 billion of its available 52 billion parameters during inference, making it more efficient than a transformer-only model of equivalent size, according to AI21 Labs.

As it stands, Jamba is in its early stages and is not yet part of AI21 Labs' enterprise offerings, though the company plans to introduce an instructional version on the AI21 Platform in beta soon.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles