Introducing an exciting new open-source model designed to elevate natural language processing capabilities in both English and French: CroissantLLM. This model is compact enough to run seamlessly on mobile devices and consumer-grade hardware, making it truly accessible. According to lead researcher Manuel Faysse, CroissantLLM aims to achieve a balanced bilingual proficiency, ensuring that French has equal footing with English in AI applications.
With the ambition of a 1:1 data ratio between English and French, CroissantLLM is structured around 1.3 billion parameters but remarkably trained on an impressive three trillion tokens—surpassing the token counts of notable models like Llama 2. The training dataset draws from a wealth of high-quality French content, spanning legal documents, cultural narratives, scientific literature, and business intelligence.
Faysse emphasizes a key benefit of CroissantLLM: its small size facilitates quick operation on lower-end GPU servers, CPUs, and mobile devices, promoting high throughput and low latency. This aspect addresses a significant barrier to mainstream AI adoption—the complication involved in running larger models. Interestingly, popularity metrics on platforms like Hugging Face reveal a trend: smaller models, such as Llama 2-7B, are often more downloaded than larger counterparts like Llama 2-70B due to their ease of use and lower operational costs.
However, CroissantLLM does trade some generalist capabilities—like advanced reasoning, mathematics, and coding skills—commonly found in larger models for a streamlined performance that is particularly effective in specific applications such as translations and chat functions.
A notable innovation accompanying CroissantLLM is FrenchBench, a new benchmark specifically designed to evaluate non-English language models. FrenchBench Gen includes assessments for tasks such as title generation, summarization, question generation, and question answering, all bolstered by the high-quality French Question Answering dataset (FQuaD). The Multiple Choice section of FrenchBench rigorously tests reasoning, factual accuracy, and linguistic proficiency.
In testing, CroissantLLM has demonstrated impressive performance among its peers, establishing itself as a leading model in French language processing, even rivaling models like Mistral 7-B.
For those eager to explore the capabilities of CroissantLLM, both the Base and Chat versions are available for download on Hugging Face. The technical report detailing the model's architecture is also accessible via arXiv, providing in-depth insights into its design and functionality.
With its focus on accessibility, efficiency, and bilingual proficiency, CroissantLLM is poised to make significant contributions to the field of AI, particularly in enhancing the use of the French language in technology.