The AI race is accelerating rapidly. Following Meta's recent launch of its open-source Llama 3.1 as a strong competitor to leading closed-source models, French AI startup Mistral has entered the fray with its next-generation model, Mistral Large 2, featuring 123 billion parameters. However, it is essential to note that this model is licensed as "open" only for non-commercial research purposes, allowing third parties to fine-tune it, while commercial applications require a separate license as indicated in Mistral's official blog and a post by research scientist Devendra Singh Chaplot.
Although Mistral Large 2 has fewer parameters than Llama 3.1's 405 billion, it still offers comparable performance. Available on Mistral’s main platform and through cloud partners, this model enhances the original with advanced multilingual capabilities, improved reasoning, code generation, and mathematics tasks. It is recognized as a GPT-4 class model, closely matching the performance of GPT-4o, Llama 3.1-405, and Anthropic’s Claude 3.5 Sonnet across various benchmarks.
Mistral continues to innovate, emphasizing cost efficiency, speed, and performance, while introducing features like advanced function calling and retrieval for developing high-performing AI applications. This initiative is part of Mistral's broader strategy to remain competitive in the AI landscape, having secured substantial funding, released task-specific models, and formed partnerships with industry leaders.
Mistral Large 2: What to Expect
In February, Mistral introduced its original Large model with a context window of 32,000 tokens, claiming a nuanced grasp of grammar and cultural context across multiple languages, including English, French, Spanish, German, and Italian. Building upon this, the new model features a significantly larger 128,000-token context window, matching the capabilities of OpenAI’s GPT-4o and Llama 3.1.
Mistral Large 2 now supports many new languages, including Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. The model excels in complex tasks, such as synthetic text generation and code generation, demonstrating enhanced reasoning capabilities.
High Performance on Benchmarks
In the Multilingual MMLU benchmark, Mistral Large 2 performed on par with Meta’s Llama 3.1-405B while delivering superior cost benefits due to its smaller size. The model is optimized for single-node inference, enabling high throughput with its 123 billion parameters, as noted in a company blog post.
The new model addresses previous shortcomings in coding tasks, with training on extensive code data. It now generates code in over 80 programming languages, including Python, Java, C, C++, JavaScript, and Bash, with high accuracy as demonstrated by the MultiPL-E benchmark. On the HumanEval and HumanEval Plus benchmarks for code generation, it surpassed Claude 3.5 Sonnet and Claude 3 Opus, trailing only GPT-4o. In math-focused benchmarks like GSM8K and Math Instruct, it secured the second-highest score.
Focus on Instruction-Following
With increased enterprise AI adoption, Mistral has prioritized minimizing hallucinations in Large 2 by refining the model to respond more cautiously and accurately. When lacking sufficient information for an answer, it transparently communicates this to users. Additionally, Mistral has enhanced the model's ability to follow user instructions and engage in multi-turn conversations, striving for concise responses suitable for enterprise applications.
Currently, Mistral Large 2 is accessible through its API endpoint and prominent cloud platforms like Google Vertex AI, Amazon Bedrock, Azure AI Studio, and IBM WatsonX. Users can also test its capabilities via the company's chatbot, providing insights into its functionalities.