The Impact of Size on Large Language Models (LLMs)
When it comes to large language models (LLMs), size is crucial as it determines where a model can operate effectively.
Stability AI, renowned for its stable diffusion text-to-image generative AI technology, has just launched one of its smallest models: Stable LM 2 1.6B. This text content generation model first debuted in April 2023 with both 3 billion and 7 billion parameter versions. The 1.6B model is the company's second release in 2024, following the earlier launch of Stability AI’s Stable Code 3B.
Introducing the Compact Stable LM 2 Model
The new Stable LM 2 1.6B is designed to lower barriers for developers and accelerate participation in the generative AI ecosystem. This compact yet powerful model supports multilingual text generation in seven languages: English, Spanish, German, Italian, French, Portuguese, and Dutch. It utilizes recent advancements in algorithmic language modeling to achieve an optimal balance between speed and performance.
Carlos Riquelme, the Head of the Language Team at Stability AI, stated, “Generally, larger models trained on similar data perform better than smaller ones. However, as models implement improved algorithms and are trained on quality data, we often see smaller models outshining their older, larger counterparts.”
Why Smaller Models Can Outperform Larger Ones
According to Stability AI, Stable LM 2 1.6B outperforms many small language models with under 2 billion parameters across various benchmarks, including Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B, and Falcon 1B. Remarkably, it also surpasses larger versions like Stability AI’s own earlier Stable LM 3B model.
“Stable LM 2 1.6B performs better than some larger models trained just months ago,” Riquelme noted. “Just like in computing technology, we’re seeing models that are becoming smaller, thinner, and better over time.”
Acknowledging Limitations
Although the smaller Stable LM 2 1.6B has impressive capabilities, its size does come with some limitations. Stability AI warns that, “due to the inherent nature of small, low-capacity language models, Stable LM 2 1.6B may exhibit common issues such as higher hallucination rates or potential toxic language.”
Transparency and Enhanced Data Training
Stability AI has been focusing on smaller yet more powerful LLM options for several months. In December 2023, it released the StableLM Zephyr 3B model, enhancing performance within a smaller framework than its initial version.
Riquelme explained that the new Stable LM 2 models utilize more data, incorporating multilingual documents in six languages beyond English. He emphasized the importance of the order in which data is presented during training, suggesting that varied data types across different training stages could improve outcomes.
To further facilitate development, Stability AI is releasing these models in both pre-trained and fine-tuned formats, along with what researchers describe as “the last model checkpoint before the pre-training cooldown.”
“Our goal is to provide tools for developers to innovate and build on our current models,” Riquelme conveyed. “We're offering a specific half-cooked model for experimentation.”
He elaborated on the training process, explaining that as the model is updated sequentially, its performance improves. The initial model lacks knowledge, while subsequent versions accumulate insights from the data. However, Riquelme also noted that models may become less pliable toward the end of training.
“We decided to provide the model in its pre-final training form to make it easier for users to specialize it for different tasks or datasets. While we can't guarantee success, we believe in people's creativity in utilizing new tools in innovative ways.”