Hugging Face CEO Clem Delangue recently shared an insightful prediction regarding the future of small language models (SLMs), stating, “In 2024, most companies will realize that smaller, cheaper, more specialized models make more sense for 99% of AI use cases. The current market is misled by companies sponsoring the costs of training and operating large models through APIs, particularly with cloud incentives.” This sentiment is supported by the momentum seen in Microsoft's recent business activities. In their latest earnings call, Microsoft reported that a variety of clients—including Anker, Ashley, AT&T, EY, and Thomson Reuters—are exploring SLMs for generative AI application development. CEO Satya Nadella emphasized, “Microsoft loves SLMs.”
What’s fueling this enthusiasm for SLMs? Generally, these models are five to ten times smaller than their large language model (LLM) counterparts, yet they deliver remarkable advantages. Sudhakar Muddu, CEO and cofounder of Aisera, explains, “SLMs consume less energy and have lower latency. Their training and inference times are faster. Additionally, their compact size allows for deployment on edge devices. However, the most significant benefit for enterprises is their ability to be tailored for specific domains and industries, which can lead to substantial productivity gains.”
Despite their potential, Muddu acknowledges challenges within the SLM landscape. The technology is still evolving and can be complex to implement.
### Common Challenges and Solutions for SLMs
#### 1. Performance
SLMs are quickly bridging the performance gap with LLMs, particularly in terms of accuracy. However, some differences remain that can affect application performance. According to David Guarrera, a principal at EY Americas Technology Consulting, “Their limited understanding and contextual awareness often lead them to struggle with complex or niche topics. This can result in responses that are not as relevant or coherent as those generated by larger models.” Therefore, organizations must carefully weigh the trade-offs between SLMs and LLMs. The performance of an SLM can significantly improve with fine-tuning; these models often perform sub-optimally when used out-of-the-box.
#### 2. Expertise
One effective strategy for optimizing SLMs is employing retrieval-augmented generation (RAG), which utilizes semantic search, particularly through vector databases, to refine relevant data. This enhances the accuracy of the generated content and ensures more up-to-date results. Cory Hymel, vice president of research and innovation at Crowdbotics, states, “Any backend developer can build an MVP or initial version of a RAG GenAI setup with the current tools.” However, advancing beyond RAG demands specialized expertise in AI, a resource that's increasingly scarce. “Fine-tuning a model involves integrating unique training data to optimize it for a specific dataset. This process is more complex and necessitates custom data curation and tagging,” Hymel explains. Additionally, enterprise applications may need to manage numerous SLMs, complicating the architecture and potentially raising costs, time to market, and upfront investments.
#### 3. Security
A key advantage of many SLMs being open source is the increased control over security measures. Enterprises can deploy SLMs in on-premise environments, but concerns remain. Mehrin Kiani, an ML scientist at Protect AI, warns, “The primary security risk when using a fine-tuned SLM is data theft and privacy concerns, especially when the model is trained on proprietary and confidential information.” Open source code can heighten vulnerability, and if project managers lack adequate security resources, it invites potential attacks.
To address these risks, Tal Furman, director of data science and deep learning at Deep Instinct, suggests, “Training models on adversarial examples and establishing detection mechanisms can help identify and mitigate malicious inputs. Implementing strong access controls, logging, and monitoring for open-source models is also essential.” For any software dealing with sensitive information, comprehensive security reviews should be integral to every stage of fine-tuning and operationalization of the SLM. However, Kiani cautions that “no security measure can ensure complete security for SLM-based applications. Enhancing security posture starts with designing applications using security-first principles. Ultimately, an insecure generative AI application is futile, irrespective of its capabilities.”
As organizations navigate the evolving landscape of small language models, understanding both their potential and limitations is crucial for harnessing the power of generative AI effectively.