After months of anticipation and a recent leak, Meta has officially launched its most advanced open-source language model, Llama-3.1, featuring an impressive 405 billion parameters.
Parameters are the key variables that dictate how a language model functions, with a higher count often signifying a model capable of understanding complex instructions with improved accuracy compared to smaller models.
Llama-3.1 builds on the Llama-3 framework introduced in April 2024, which was previously available only in 8-billion and 70-billion parameter versions. This new 405 billion parameter model not only has the capacity to instruct smaller models but also excels in synthetic data generation. It operates under a unique open-source license that supports model distillation and synthetic data creation.
“This model delivers state-of-the-art performance among open-source models, making it highly competitive with proprietary models,” said Ragavan Srinivasan, Meta’s VP of AI Program Management.
At launch, Llama-3.1 is multilingual, supporting prompts in English, Portuguese, Spanish, Italian, German, French, Hindi, and Thai. The previous Llama-3 models will also gain multilingual capabilities.
With an expanded context window of 128,000 tokens, Llama-3.1 can process a volume of text equivalent to a nearly 400-page novel.
Benchmark Testing and Performance
In a recent blog post, Meta highlighted that Llama-3.1 was tested on over 150 benchmark datasets and underwent human-guided evaluations for practical applications. The 405 billion parameter model is poised to compete effectively with top models like GPT-4, GPT-4o, and Claude 3.5 Sonnet, with smaller models showing similar performance.
The Llama family has gained popularity among developers for its accessibility across various platforms. Meta asserts that Llama-3 models can match or even surpass rival models in multiple benchmarks, excelling in tasks like multiple-choice questions and coding, particularly against Google’s Gemma and Gemini, Anthropic’s Claude 3 Sonnet, and Mistral’s 7B Instruct.
Teaching Model Capabilities
The updated licensing across all Meta models facilitates model distillation, enabling knowledge transfer from larger models like Llama-3.1 to their smaller counterparts.
Srinivasan described the 405 billion parameter model as a “teaching model,” rich in knowledge and reasoning capabilities. “This model serves as a teacher, allowing users to distill its knowledge into smaller and more efficient models tailored to specific applications,” he noted.
By leveraging model distillation, users can either create new models or refine existing Llama-3.1 versions, focusing on specific use cases. Additionally, the ability to generate synthetic data allows for learning while safeguarding copyright and sensitive information.
Innovative Model Architecture
To ensure scalability, Meta optimized its training stack, employing over 16,000 Nvidia H100 GPUs. Researchers opted for a traditional transformer-only architecture over a mixture-of-experts model, which has become increasingly popular.
Their iterative post-training procedure for supervised fine-tuning, combined with high-quality synthetic data generation, enhances overall performance.
As with prior Llama models, Llama-3.1 will be open-sourced and accessible through platforms like AWS, Nvidia, Groq, Dell, Databricks, Microsoft Azure, Google Cloud, and more.
Availability and Further Use
Matt Wood, AWS VP for AI, confirmed that Llama-3.1 will be offered on both AWS Bedrock and SageMaker. AWS users can fine-tune Llama-3.1 models using AWS services while integrating additional safety features.
“Customers can leverage the extensive capabilities of Llama, modify the models, and utilize all the tools available on AWS,” Wood stated.
Llama-3.1 (405B) will also be accessible via WhatsApp and Meta AI, providing users with rich AI capabilities at their fingertips.