"MLPerf Inference 4.1 Results Highlight Performance Gains as Nvidia Blackwell Makes Its First Testing Appearance"

MLCommons has announced its latest MLPerf inference results, showcasing a new generative AI benchmark and the first validated performance metrics for Nvidia's next-generation Blackwell GPU processor.

As a multi-stakeholder, vendor-neutral organization, MLCommons oversees the MLPerf benchmarks for AI training and inference. The latest results, featuring 964 performance submissions from 22 organizations, provide a crucial overview of the fast-evolving AI hardware and software landscape. By offering standardized and reproducible measurements of AI inference performance, MLPerf equips enterprise decision-makers with the insights needed to navigate AI deployment complexities, balancing performance, efficiency, and cost.

Key Highlights from MLPerf Inference v4.1

Among the notable updates in MLPerf Inference v4.1 is the introduction of the Mixture of Experts (MoE) benchmark, which evaluates the performance of the Mixtral 8x7B model. This round also showcased a diverse array of new processors and systems, including AMD’s MI300x, Google’s TPUv6e (Trillium), Intel’s Granite Rapids, Untether AI’s SpeedAI 240, and the Nvidia Blackwell B200 GPU.

David Kanter, founder of MLPerf at MLCommons, expressed excitement about the diverse submissions: “The broader the range of systems evaluated, the greater the opportunities for comparison and insights within the industry.”

The MoE Benchmark for AI Inference

A significant advancement in this round is the MoE benchmark aimed at managing the challenges posed by increasingly large language models. Miro Hodak, senior technical staff member at AMD and MLCommons inference working group chair, explained that rather than relying on a single large model, the MoE approach consists of several smaller, domain-specific models, which improves efficiency during deployment.

The MoE benchmark evaluates hardware performance using the Mixtral 8x7B model, which includes eight experts with 7 billion parameters each. The model integrates three key tasks:

- Question-answering based on the Open Orca dataset

- Math reasoning using the GSMK dataset

- Coding tasks based on the MBXP dataset

Hodak emphasized that the MoE framework not only better utilizes model strengths compared to traditional single-task benchmarks but also promotes more efficient AI solutions for enterprises.

Nvidia's Blackwell GPU: Promising AI Inference Enhancements

The MLPerf testing process provides vendors with a platform to demonstrate upcoming technology with rigorously peer-reviewed results. Among the highly anticipated releases is Nvidia’s Blackwell GPU, announced in March. Although it will be several months before users can access Blackwell, the MLPerf Inference 4.1 results offer a glimpse of its capabilities.

“This is our first performance disclosure of measured data on Blackwell, and we’re excited to share this,” said Dave Salvator from Nvidia during a recent briefing.

The benchmarks specifically highlight the generative AI workload performance based on MLPerf’s largest LLM workload, Llama 2 70B. “We’re achieving 4x more performance per GPU compared to our previous generation,” Salvator noted.

In addition to the new Blackwell GPU, Nvidia continues to extract more performance from its existing hardware. The MLPerf Inference 4.1 results indicate that the Hopper GPU has improved by 27% since the last benchmarks six months ago, driven purely by software enhancements.

“These gains come from software alone,” Salvator explained. “We utilized the same hardware as before, but ongoing software optimizations enable us to achieve higher performance.”

With these advancements, MLCommons' latest MLPerf Inference results provide critical insights into the future of AI hardware and its deployment potential in various enterprise applications.

Most people like

Find AI tools in YBX