Mistral Unveils Codestral: A Cutting-Edge Code Generation LLM Claimed to Outperform Competitors

Today, Paris-based Mistral, an AI startup that made headlines with Europe's largest-ever seed round last year, has entered the programming and development arena with the launch of Codestral, its inaugural code-focused large language model (LLM).

Now available under a non-commercial license, Codestral features a 22 billion parameter, open-weight generative AI model that excels in coding tasks, from code generation to implementation.

Mistral states that this model supports over 80 programming languages, making it a vital resource for software developers seeking to build innovative AI applications. The company asserts that Codestral surpasses previous coding models, including CodeLlama 70B and Deepseek Coder 33B, and is being adopted by industry leaders like JetBrains, SourceGraph, and LlamaIndex.

A High-Performance Tool for Developers

Codestral 22B boasts a context length of 32K, empowering developers to engage with code across various environments and projects. Trained on a dataset covering more than 80 programming languages, it is well-equipped for diverse coding tasks such as generating code from scratch, completing functions, writing tests, and filling in gaps in partial code. Supported languages include popular options like SQL, Python, Java, C, and C++, along with niche options like Swift and Fortran.

Mistral claims that Codestral can enhance developer productivity, streamline workflows, and save significant time while reducing the likelihood of errors in application development.

Although the model has just launched and awaits public testing, Mistral is confident that it outperforms current models, including CodeLlama 70B, Deepseek Coder 33B, and Llama 3 70B, across most programming languages.

Impressive Performance Metrics

On RepoBench, designed to assess long-range repository-level Python code completion, Codestral achieved a 34% accuracy score, surpassing all competitors. It also excelled on HumanEval for Python code generation and CruxEval for output prediction with scores of 81.1% and 51.3%, respectively. Additionally, it outperformed other models on HumanEval for Bash, Java, and PHP.

While its performance in C++, C, and Typescript was slightly lower, its average score of 61.5% across all tests edged ahead of Llama 3 70B's score of 61.2%. In the Spider assessment for SQL, it ranked second with a score of 63.5%.

Prominent tools for developer productivity and AI application development, including LlamaIndex, LangChain, Continue.dev, Tabnine, and JetBrains, have begun testing Codestral.

“From our initial testing, it’s an excellent option for code generation workflows due to its speed, favorable context window, and support for tool use. We tested it with LangGraph for self-corrective code generation, and it performed exceptionally well right from the start,” said Harrison Chase, CEO and co-founder of LangChain.

Getting Started with Codestral

Mistral offers Codestral 22B on Hugging Face under a non-production license, allowing developers to use the technology for non-commercial purposes, testing, and research support.

Two API endpoints are also available: codestral.mistral.ai, intended for Instruct or Fill-In-the-Middle routes within IDEs, providing a user-managed API key during an eight-week free beta; and api.mistral.ai, for broader research, batch queries, or third-party application development, with costs billed per token.

Developers can explore Codestral's capabilities through Le Chat, Mistral’s free conversational interface featuring an instructed version of the model.

Mistral’s introduction of Codestral presents a significant option for enterprise researchers to expedite software development, but its performance against other code-centric models, like StarCoder2 from recent launches or offerings from OpenAI and Amazon, remains to be seen.

Both OpenAI's Codex, which powers GitHub Copilot, and Amazon's CodeWhisper are key competitors. Additionally, OpenAI's ChatGPT is increasingly used as a coding tool, while its GPT-4 Turbo model fuels Devin, a semi-autonomous coding agent by Cognition. The competitive landscape also includes Replit, which offers several small AI coding models, and Codenium, recently valued at $500 million following a $65 million Series B funding round.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles