Helsinki-based AI startup Silo AI has generated excitement this week with the launch of Poro, an open-source large language model (LLM) focused on enhancing multilingual AI capabilities for European languages.
Poro is the initial offering in a series of open-source models aimed at supporting all 24 official European Union languages. Developed by SiloGen, Silo AI’s generative AI division, in collaboration with the University of Turku’s TurkuNLP research group, Poro is set to revolutionize language processing across the continent.
“It is a question of digital sovereignty,” stated Peter Sarlin, CEO of Silo AI. “We want models that embody European values, culture, and languages. Our goal is to enable European companies—and any organization—to create proprietary models that maintain their value within Europe.”
The Poro 34B model, boasting 34.2 billion parameters, derives its name from the Finnish word for “reindeer.” It utilizes a BLOOM transformer architecture with ALiBi embeddings and was trained on a diverse dataset of 21 trillion multilingual tokens, including English, Finnish, and programming languages like Python and Java.
Poro is being trained on LUMI, Europe's most powerful supercomputer, located in Kajaani, Finland, featuring 512 AMD Instinct MI250X GPUs with an impressive 74 petaflops of computing power.
Sarlin emphasized that Poro tackles a significant challenge: training effective natural language models for lower-resourced European languages like Finnish. The model employs a cross-lingual training strategy, capitalizing on data from more resourced languages like English.
Poro is the second major open-source LLM to emerge from Europe, following the highly funded Mistral 7B from French startup Mistral AI. Its launch underscores Europe’s growing footprint in the rapidly evolving generative AI landscape and signals escalating competition among various AI research and development entities.
Poro Research Checkpoints
SiloGen is committed to transparency through the Poro Research Checkpoints program, documenting the model's training journey. “We will release checkpoints throughout the training process, a relatively new approach,” explained Sarlin. “Such transparency in model training is uncommon.”
The initial checkpoint from Poro 34B captures the first 30% of its training. Preliminary benchmarks indicate that Poro is already achieving state-of-the-art results at this phase. In the FIN-bench evaluation for Finnish, Poro surpasses specialized monolingual Finnish models like FinGPT.
“The model has demonstrated superior performance for low-resource languages with just 30% of training completed,” Sarlin noted. By identifying common patterns across related languages, Poro excels even when training data is limited.
Impressively, Poro's multilingual capabilities do not compromise its performance in English. Testing shows that it outperforms existing models on Finnish benchmarks and is on track to match or exceed English performance.
An Open-source Alternative to Big Tech
Sarlin advocates for open-source models like Poro as the future of AI, providing a transparent and ethical alternative to proprietary models from tech giants. “I believe we will see numerous open-source alternatives emerge,” he said. “The most secure future is one rooted in open-source, with clear visibility into model construction and architecture.”
He added that significant efforts have been made to ensure both the data and model adhere to regulatory standards by design. Silo AI plans to regularly release Poro checkpoints throughout the training process, aiming to establish an extensive family of open-source models for all European languages.
Collaborating with the University of Turku
The development of Poro reflects a fruitful partnership between Silo AI and the University of Turku, where researchers from TurkuNLP have pioneered open-source resources for the Finnish language. “My research group and several professors joined forces to scale the company using revenue funding,” Sarlin shared. “With over 300 employees, most of whom hold PhDs in AI-related fields, we differ considerably from many others in the industry.”
This collaboration fuses Silo AI’s practical AI expertise with the University’s lead in multilingual modeling research, showcasing a model for effective industry-academia cooperation in enhancing AI capabilities for lower-resourced European languages.
Is Europe Poised to Lead in Open-source AI?
The launch of Poro marks the beginning of a new phase of open collaboration and transparency in natural language processing. Initiatives like Poro Research Checkpoints provide insights and resources previously monopolized by major tech companies.
“We partner with clients such as Allianz, Rolls Royce, Honda, and Philips, and we have heard concerns from large enterprises regarding future regulations and the models they can utilize,” said Sarlin.
If Poro lives up to its potential, it could democratize access to powerful multilingual models, offering Europe a native alternative to US tech giants. While it's still early, Poro represents a significant step toward making language AI accessible and open, moving it out of proprietary silos and into the public domain.