Fine-tuning is essential for enhancing large language model (LLM) outputs and aligning them with specific enterprise needs. When executed properly, this process yields more accurate and valuable model responses, enabling organizations to maximize their generative AI applications.
However, fine-tuning can be costly, creating barriers for some enterprises seeking to benefit from these advanced capabilities.
Mistral, an open-source AI model provider that is rapidly approaching a $6 billion valuation just 14 months post-launch, is stepping into the fine-tuning arena. Their new AI developer platform, La Plateforme, introduces enhanced customization tools designed to streamline fine-tuning processes, reduce training costs, and lower entry barriers.
With a name reflecting a strong wind in southern France, Mistral is making waves in the AI landscape, continually innovating and attracting significant funding. The company highlights in a recent blog post that fine-tuning smaller models for specific domains can enhance performance while minimizing deployment costs and expediting application speed.
Tailoring Mistral Models for Increased Customization
Mistral has established itself by releasing robust LLMs under open-source licenses, allowing for free adaptation. It also offers paid services, including an API and the La Plateforme developer platform. This enables users to build applications using Mistral models without the need for extensive server setups; they can make API calls to leverage Mistral's capabilities.
Now, customers can customize Mistral models on La Plateforme, utilize open-source code from Mistral on GitHub, or access custom training services.
For developers wanting to work independently on their infrastructure, Mistral has launched the lightweight codebase, mistral-finetune, which employs the LoRA paradigm to minimize the number of trainable parameters.
Mistral notes, “With mistral-finetune, you can fine-tune all our open-source models on your infrastructure without sacrificing performance or memory efficiency.”
For those interested in serverless fine-tuning, Mistral offers new services that leverage refined research and development techniques. LoRA adapters help preserve the foundational knowledge of models while enabling efficient deployments.
Mistral describes this as a significant advancement in making sophisticated scientific methods accessible to AI application developers, allowing for rapid and cost-effective model customization.
Fine-tuning services are compatible with Mistral’s 7.3 billion parameter model, Mistral 7B, and Mistral Small. Current users can utilize Mistral’s API for immediate customization, with plans to introduce more models for fine-tuning in the upcoming weeks.
Additionally, Mistral's custom training services optimize AI models for specific applications using proprietary data, often employing cutting-edge techniques such as continuous pretraining to incorporate specialized knowledge.
This approach facilitates the development of highly specialized and efficient models tailored to particular domains.
To celebrate these new offerings, Mistral has launched an AI fine-tuning hackathon running until June 30, encouraging developers to experiment with the startup's innovative fine-tuning API.
Mistral's Unprecedented Growth and Innovation
Since its inception in April 2023 by former Google DeepMind and Meta employees Arthur Mensch, Guillaume Lample, and Timothée Lacroix, Mistral has experienced rapid growth. The company secured a record-setting $118 million seed round—the largest in Europe's history—and quickly formed partnerships with major players like IBM. In February, Mistral Large was made available through a collaboration with Microsoft on Azure cloud.
Recently, SAP and Cisco revealed their support for Mistral, and last month, the company launched Codestral, its first code-centric LLM, claiming it surpasses all competitors. Mistral is also nearing a substantial $600 million funding round, which would elevate its valuation to $6 billion.
Positioned as a direct competitor to OpenAI and Meta's Llama 3, Mistral Large is noted as the second most capable commercial language model globally, following OpenAI’s GPT-4. Mistral 7B, introduced in September 2023, claims to outperform Llama on several benchmarks and closely matches CodeLlama 7B performance in coding tasks.
What innovations will Mistral unveil next? We’ll find out soon.