Google Unveils Open Source Tools for Enhanced AI Model Development

Google Embraces Open Source at Cloud Next: A Shift Towards Generative AI

In a typical year, Cloud Next—one of Google’s two major annual developer conferences, alongside I/O—primarily showcases managed and proprietary products that are shielded behind locked-down APIs. However, this year, in an effort to build goodwill among developers and elevate its ecosystem ambitions, Google introduced several open-source tools specifically designed to support generative AI projects and infrastructure.

The first noteworthy release, MaxDiffusion, was quietly launched in February and comprises a collection of reference implementations for various diffusion models—including the popular image generator Stable Diffusion—that operate on XLA devices. “XLA,” which stands for Accelerated Linear Algebra, symbolizes a technique that enhances the speed and optimization of specific AI workloads, including fine-tuning and serving.

Google’s own tensor processing units (TPUs) qualify as XLA devices, along with the latest Nvidia GPUs.

In addition to MaxDiffusion, Google unveiled JetStream, a new engine designed to run generative AI models—specifically those focused on text generation, excluding Stable Diffusion. While currently limited to supporting TPUs, with GPU compatibility expected in the near future, JetStream boasts up to 3x higher performance-per-dollar for models such as Google’s Gemma 7B and Meta’s Llama 2, according to Google’s claims.

“As businesses scale their AI workloads, there’s a growing demand for an economical inference stack that provides high performance,” stated Mark Lohmeyer, Google Cloud’s GM of compute and machine learning infrastructure, in a blog post shared with tech enthusiasts. “JetStream addresses this need and includes optimizations for popular open models such as Llama 2 and Gemma.”

While a “3x” performance improvement is an ambitious assertion, it raises several questions about its calculation. Which generation of TPU was used for comparison? What baseline engine was selected? How is “performance” being quantified in this context?

I’ve reached out to Google for clarification on these points and will provide updates as they come.

Next on Google’s list of open-source initiatives is an expansion of MaxText, its collection of text-generating AI models optimized for TPUs and Nvidia GPUs in cloud environments. MaxText now includes Gemma 7B, OpenAI’s GPT-3 (the predecessor to GPT-4), Llama 2, as well as models from AI startup Mistral—each customizable and fine-tunable to meet developers’ unique requirements.

“We’ve extensively optimized these models for performance on TPUs and closely collaborated with Nvidia to enhance performance on large GPU clusters,” Lohmeyer noted. “These enhancements lead to improved GPU and TPU utilization, resulting in better energy efficiency and cost savings.”

Lastly, Google has partnered with AI startup Hugging Face to develop Optimum TPU, a toolset aimed at simplifying the process of deploying specific AI workloads onto TPUs. The objective is to lower the barriers for integrating generative AI models into TPU infrastructure, particularly for text-generating applications.

However, Optimum TPU currently has limited capabilities, supporting only the Gemma 7B model at this time. Notably, it does not yet facilitate the training of generative models on TPUs—only their execution.

Google has promised further updates and enhancements in the future.

Most people like

Find AI tools in YBX