Today, Cohere for AI (C4AI), the non-profit research division of the Canadian enterprise AI startup Cohere, announced the open weights release of Aya 23, a cutting-edge family of multilingual language models.
Aya 23 is available in two variants: 8B and 35B parameters. In this context, parameters represent the strength of connections between artificial neurons, with larger numbers indicating a more powerful and capable model. This release is part of C4AI's Aya initiative, which aims to enhance multilingual capabilities.
C4AI has made the weights of Aya 23 open source, allowing third-party researchers to fine-tune the model to meet their specific needs. Although this does not constitute a full open-source release (which would include training data and architecture details), it offers significant flexibility, similar to Meta’s Llama models.
Building on its predecessor, Aya 101, Aya 23 supports 23 languages: Arabic, Chinese (simplified and traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese.
Cohere for AI claims that these models extend state-of-the-art language modeling capabilities to almost half of the world’s population. Additionally, Aya 23 outperforms not only Aya 101 but also other open models like Google’s Gemma and Mistral’s offerings, providing higher-quality responses across the supported languages.
Breaking Language Barriers with Aya
While large language models (LLMs) have gained traction in recent years, most have focused primarily on English. As a result, many models struggle with less-resourced languages.
C4AI researchers identified two key issues: a scarcity of robust multilingual pre-trained models and a lack of diverse instruction-style training data. To tackle these challenges, C4AI launched the Aya initiative, partnering with over 3,000 independent researchers from 119 countries. Their first achievement was the Aya Collection, a vast multilingual instruction-style dataset with 513 million prompts and completions, which was subsequently used to create the instruction-tuned LLM covering 101 languages.
Released in February 2024, Aya 101 marked a significant advancement in multilingual language modeling. However, it was built on mT5, which is now outdated, and its broad design diluted performance across individual languages.
With the introduction of Aya 23, Cohere for AI has shifted towards a balanced approach, concentrating on 23 languages to enhance performance. These models, based on Cohere’s Command series and the Aya Collection, improve generation quality by focusing resources on fewer languages.
Evaluation results indicate that Aya 23 outperforms Aya 101 and other widely-used models like Gemma and Mistral in various discriminative and generative tasks. The improvements include up to 14% on discriminative tasks, 20% on generative ones, and a 41.6% boost on multilingual MMLU. Notably, Aya 23 achieves a 6.6x increase in multilingual mathematical reasoning compared to Aya 101.
Accessible Now
Cohere for AI has taken another important step toward high-performance multilingual models. The open weights for the 8B and 35B models are now available on Hugging Face under the Creative Commons attribution-noncommercial 4.0 international public license.
“By releasing the weights of the Aya 23 model family, we aim to empower researchers and practitioners to advance multilingual models and applications,” the researchers stated. Users can also experiment with the new models for free on the Cohere Playground.