Recently, Microsoft and Nvidia unveiled their latest small language models, Phi-3.5-mini-instruct and Mistral-Nemo-Minitron8B, highlighting a shift in focus from large-scale models to smaller ones in the tech industry. This transformation isn’t just a trend; it's a response to the escalating demands of computing resources and energy consumption associated with large models.
The appeal of these small language models (SLMs) lies in their ability to strike a balance between efficient resource utilization and functional performance. In many cases, their capabilities rival those of their larger counterparts. Clement Delangue, CEO of AI startup Hugging Face and recognized as a global AI leader by TIME magazine, suggests that up to 99% of use cases can effectively utilize SLMs. He predicts that 2024 will be the year of SLMs. Major tech companies, including Google, Microsoft, and Meta, have already launched nine small models this year. Apple, too, has been favoring the use of small models on devices to enhance user experience.
The rise of small models is closely linked to the challenges posed by large models, particularly in terms of performance enhancement and resource expenditure. The computational power and energy required to train and operate large models have raised barriers for smaller organizations or individuals wishing to participate in their development. The International Energy Agency estimates that by 2026, the power consumption of data centers, cryptocurrency, and AI-related operations could equal the total electricity usage of Japan.
One of the significant drawbacks of large models is their tendency to produce "hallucinations"—outputs that appear plausible but are incorrect. Despite these issues, large models remain a prominent trend in the industry. Figures like Zhou Hongyi, founder and chairman of 360 Group, and Robin Li, founder and CEO of Baidu, have expressed that large models will shape the future of the internet and drive innovation across various sectors.
However, small models have distinct advantages, especially in efficiency and specialization for specific applications, making them less prone to hallucinations. While small models may not match the performance of large models in every aspect, they excel in targeted roles. Nonetheless, their capabilities can be limited outside of specialized domains due to a lack of comprehensive databases.
Industry experts emphasize that large and small models serve unique purposes with inherent differences. Consequently, while small models are gaining traction, they will not completely replace the need for large models in the vast landscape of artificial intelligence.