If your target audience speaks 22 official languages and over 19,000 dialects, is a text-only AI chatbot that excels in just a few languages truly adequate? This challenge is what Indian AI startup Sarvam is determined to address. On Tuesday, the company unveiled a new range of products, including a voice-enabled AI bot that supports more than 10 Indian languages. Sarvam believes that users in India would prefer conversing with an AI in their native tongue rather than communicating through text.
“People prefer to converse in their own language. Typing in Indian languages today is quite challenging,” said Vivek Raghavan, co-founder of Sarvam AI. The Bengaluru-based startup primarily targets businesses and enterprises, marketing its voice-enabled AI bots across various sectors, especially those in need of customer support. One success story includes Sri Mandir, a startup that provides religious content and has utilized Sarvam’s AI agent to process over 270,000 transactions for payment acceptance.
Sarvam's AI voice agents can effortlessly integrate into platforms like WhatsApp, mobile apps, and even traditional voice calls. Backed by Peak XV and Lightspeed, the startup is setting a competitive price for its AI agents, starting at ₹1 (approximately 1 cent) per minute.
The foundation of Sarvam's voice-enabled AI agents is a compact language model known as Sarvam 2B, designed using a dataset of 4 trillion tokens. This model relies entirely on synthetic data, which has generated some debate among experts. While synthetic data—produced by large language models to emulate real-world information—can be prone to inaccuracies, Raghavan noted that Sarvam chose this approach due to the limited availability of Indian language content online. The company has also implemented models to clean and enhance the initial data used for generating synthetic datasets.
Raghavan emphasized the cost effectiveness of Sarvam 2B, stating it will be priced at just a fraction of comparable industry models. The startup is also planning to open source the model, inviting community collaboration for further development.
“While large foundational models are thrilling, smaller language models can provide a superior, more tailored, cost-effective experience with reduced latency,” Raghavan explained. “For occasional queries, large models might suffice, but for frequent daily interactions, smaller models are more appropriate.”
In addition to the AI bot, Sarvam is also releasing an audio-language model named Shuka, built on its Saaras v1 audio decoder alongside Meta’s Llama-3-8B Instruct. This model is open-sourced, allowing developers to utilize the startup's translation, text-to-speech (TTS), and other modules to create voice interfaces.
Another innovative product, labeled “A1,” serves as a generative AI workbench tailored for lawyers to consult regulations, draft and redact documents, and extract necessary data.
Sarvam is among a select group of Indian startups championing solutions that align with the nation's interests, contributing to the government’s initiative to develop a proprietary AI infrastructure. Countries around the world are increasingly pursuing “sovereign AI,” a framework for AI development and control at the national level, aimed at preserving data privacy, boosting economic growth, and customizing AI capabilities to fit cultural contexts. With significant investments currently led by the United States and China, India is in pursuit of a similar path with its “IndiaAI” program and language-focused models.
One initiative within the IndiaAI framework is the IndiaAI Compute Capacity, intended to develop a supercomputer powered by a minimum of 10,000 GPUs. One of the models under development, called Bhashini, aims to make digital services accessible across a variety of Indian languages. Raghavan expressed Sarvam’s readiness to support the IndiaAI program: “If the opportunity arises, we will collaborate with the government," he stated.