When Google DeepMind introduced Gemma last February, it launched two open-source models with 2 billion and 7 billion parameters. At this year's Google I/O developer conference, the company unveiled the Gemma 2 series, starting with a lightweight model boasting an impressive 27 billion parameters. However, its release is set for June.
“This 27B model was intentionally selected,” stated Josh Woodward, Google's Vice President of Google Labs, during a recent roundtable discussion. “It's optimized for Nvidia's next-gen GPUs or a single TPU host in Vertex AI, making it user-friendly. We’re already seeing excellent quality, with performance surpassing models twice its size.”
Gemma is designed for developers aiming to integrate AI into apps and devices without extensive memory or processing power requirements. This makes it ideal for resource-constrained environments, such as smartphones, IoT devices, and personal computers. Since its initial launch, Google has introduced several variants, including CodeGemma for code completion, RecurrentGemma for improved memory efficiency, and the recently released PaliGemma for vision-language tasks.
With 27 billion parameters, Gemma 2 is poised to deliver enhanced accuracy and performance for more complex tasks compared to its predecessors. Access to a larger training dataset enables the AI to generate higher-quality responses more quickly.
While Woodward stated that Gemma 2 is designed to run on a single TPU, he referred specifically to TPUv5e, Google's latest-generation chip released last August. This means that Gemma 2 will require a single, specialized AI chip to optimize computations, resulting in lower latency and greater efficiency for tasks such as image recognition and natural language processing. This efficiency translates to cost savings for developers, allowing them to reinvest resources into their applications.
Gemma 2's debut coincides with OpenAI's launch of GPT-4o, its multimodal LLM, and is positioned as a “significant upgrade,” especially for free ChatGPT users.