The well-funded French AI startup Mistral, renowned for its advanced open-source AI models, has launched two new large language models (LLMs): a math-focused model and a code-generation model for developers, both based on the innovative Mamba architecture introduced by researchers last year.
Mamba aims to enhance the efficiency of traditional transformer architectures by streamlining attention mechanisms. This advancement allows Mamba-based models to achieve faster inference times and support longer context, distinguishing them from typical transformer models. Other companies, including AI21, have also released AI models utilizing this architecture.
Mistral’s new Codestral Mamba 7B is designed for rapid response times, even with extended input texts, making it ideal for local coding projects. Available on Mistral's la Plateforme API, it can process inputs of up to 256,000 tokens—twice the capacity of OpenAI’s GPT-4.
In benchmarking tests, Codestral Mamba outperformed several rival open-source models, such as CodeLlama 7B, CodeGemma-1.17B, and DeepSeek in HumanEval assessments.
Developers can modify and deploy Codestral Mamba via its GitHub repository and HuggingFace under an open-source Apache 2.0 license. Mistral asserts that the earlier version of Codestral surpassed other code generators, including CodeLlama 70B and DeepSeek Coder 33B.
AI-powered code generation and coding assistant tools have become essential applications, with platforms like GitHub's Copilot, Amazon's CodeWhisperer, and Codenium gaining traction.
Mistral's second launch, Mathstral 7B, is focused on math-related reasoning and scientific discovery, developed in collaboration with Project Numina. With a 32k context window, Mathstral operates under an Apache 2.0 open-source license and has outperformed all existing math reasoning models. It delivers "significantly better results" on benchmarks requiring extensive inference-time computations, and users can choose to utilize it as is or fine-tune it for specific needs.
“Mathstral exemplifies the excellent performance-to-speed tradeoffs achievable when constructing models for specialized applications—a philosophy we are committed to in la Plateforme, particularly with its enhanced fine-tuning capabilities,” Mistral shared in a blog post.
Mathstral is accessible through Mistral's la Plateforme and HuggingFace.
Competing steadily with industry leaders like OpenAI and Anthropic, Mistral recently secured $640 million in Series B funding, boosting its valuation to nearly $6 billion, with investments from tech giants including Microsoft and IBM.