Everyone is buzzing about Nvidia's astounding earnings, which soared 265% year-over-year. However, don’t overlook Groq, a Silicon Valley startup innovating AI chips for large language model (LLM) inference — the process of making predictions using existing models rather than training new ones. Last weekend, Groq experienced a surge of attention that many startups can only dream of.
While it wasn't as sensational as one of Elon Musk's posts about the unrelated large language model Grok, Nvidia likely took note when Matt Shumer, CEO of HyperWrite, shared on X about Groq's “wild tech.” Shumer highlighted Groq's capability of serving Mixtral at nearly 500 tokens per second (tok/s) with almost instantaneous responses.
Shumer also showcased a “lightning-fast answers engine” on X, delivering “factual, cited answers with hundreds of words in less than a second.” This prompted widespread interest in Groq’s chat application, where users can select outputs generated by Llama and Mistral LLMs. This buzz followed an interview where Groq CEO Jonathan Ross demonstrated how Groq powers an audio chat interface that “breaks speed records.”
Currently, no company can rival Nvidia’s dominance, holding over 80% of the high-end chip market. Other AI chip startups like SambaNova and Cerebras have struggled to gain traction despite entering the AI inference realm. With Nvidia reporting $22 billion in fourth-quarter revenue, Ross emphasized that Groq offers a "super-fast," cost-effective option tailored for LLMs, addressing the prohibitive expenses associated with inference.
Ross boldly stated, “We are probably going to be the infrastructure that most startups are using by the end of the year,” and encouraged startups to reach out for competitive pricing.
Groq LPUs vs. Nvidia GPUs
Groq describes its LPUs, or language processing units, as a groundbreaking end-to-end processing unit system optimized for the rapid inference needed for AI language applications. Unlike Nvidia GPUs, which focus on parallel graphics processing, Groq's LPUs effectively manage sequences of data — think code and natural language — allowing for faster output by overcoming compute density and memory bandwidth limitations faced by traditional GPUs and CPUs.
Moreover, Ross noted that Groq differentiates itself from companies like OpenAI by not training models, meaning it can maintain user privacy by avoiding the logging of chat queries.
With estimates that ChatGPT could run over 13 times faster using Groq chips, could OpenAI be a future partner? While Ross did not confirm any specific collaborations, he mentioned that a partnership could be beneficial if both parties share common goals.
Is Groq's LPU Truly a Game-Changer in AI Inference?
I had been keen to speak with Ross since December when Groq was touted as the “US chipmaker poised to win the AI race.” Now, I was eager to understand if Groq's LPUs are genuinely a breakthrough in AI inference or just another fleeting trend driven by PR hype.
Ross described Shumer's posts as "the match that lit the fuse," noting that over 3,000 individuals reached out for API access within 24 hours. “We’re letting people use it for free at the moment,” he added.
Ross is no newcomer to the startup scene; he co-invented Google's tensor processing unit (TPU) before founding Groq in 2016. He explained that Groq’s approach is unique: “If you’re building a car, you can start with the engine or the driving experience. We started with the driving experience, spending the first six months focusing on developing a sophisticated compiler.”
The demand for Nvidia GPUs has surged across the AI industry, creating a lucrative market. New GPU cloud services have emerged, while former GitHub CEO Nat Friedman recently mentioned a marketplace for GPU clusters. Reports indicate that OpenAI’s CEO Sam Altman plans to address AI chip demands through a massive project with a staggering price tag and complicated geopolitical implications.
Ross believes the current GPU climate is somewhat a response to Groq's initiatives. “There’s a little bit of a virtuous cycle,” he said, referring to Nvidia’s dealings with sovereign nations as part of his own upcoming global negotiations.
When asked about Altman's ambition for a $7 trillion AI chip initiative, Ross confidently claimed, “We could do it for $700 billion. We’re a bargain.”
Groq aims to enhance AI chip supply capabilities as well. “[By year’s end], we will definitely have a capacity of 25 million tokens per second, which is where we estimate OpenAI will be at the end of 2023,” he stated, highlighting ongoing discussions with various countries to expand this capability.
However, Groq must also address practical challenges, such as implementing API billing in light of its recent surge in interest. When I inquired about their plans for billing, Ross replied, “We’ll look into it,” only for his PR representative to affirm, “Yes, that will be one of the first orders of business.”