MLCommons Releases MLPerf 4.0: A Benchmark for AI Inference
MLCommons has unveiled its MLPerf 4.0 benchmarks for AI inference, highlighting the rapid advancements in software and hardware.
As generative AI evolves and gains traction, the demand for a vendor-neutral performance benchmarking framework is more critical than ever. MLCommons fulfills this need through its MLPerf benchmarks, which provide valuable insights into both training and inference capabilities. The MLPerf 4.0 Inference results mark the first update since the MLPerf 3.1 results were published in September 2023.
Development in AI has progressed significantly in the past six months, with major hardware companies like Nvidia and Intel enhancing their products to optimize inference performance. The new MLPerf 4.0 results demonstrate substantial improvements in the technologies from both Nvidia and Intel.
Notably, the MLPerf inference benchmarks have undergone changes as well. While MLPerf 3.1 featured the GPT-J 6B parameter model for text summarization, MLPerf 4.0 shifts focus to the widely utilized Llama 2 70 billion parameter model for question answering (Q&A). Additionally, for the first time, MLPerf 4.0 introduces a benchmark for generative AI image creation using Stable Diffusion.
"MLPerf serves as the industry standard for enhancing speed, efficiency, and accuracy in AI," said David Kanter, MLCommons Founder and Executive Director, during a press briefing.
Why AI Benchmarks Matter
The latest MLCommons benchmark includes over 8,500 performance results, evaluating various combinations of hardware, software, and AI inference use cases. Kanter emphasized the importance of establishing meaningful metrics for AI performance.
"The goal is to create robust metrics that measure AI capabilities, enabling further enhancements," he explained.
MLCommons aims to unify the industry by conducting standardized tests using consistent datasets and configurations across different systems. All results are shared with participants, fostering transparency and collaborative improvement.
Ultimately, this standardized approach empowers enterprises to make informed decisions when selecting AI solutions.
“This aids buyers in evaluating systems—whether on-premises, cloud-based, or embedded—based on relevant workloads,” Kanter noted. “If you’re in the market for a system to run large language model inference, benchmarks can guide your choices."
Nvidia Leads the Way in AI Inference Performance
Nvidia once again showcases its dominance in the MLPerf benchmarks with remarkable results.
While new hardware typically enhances performance, Nvidia has effectively improved inference capabilities on its existing technology. By utilizing Nvidia’s TensorRT-LLM open-source inference technology, the company nearly tripled the inference performance of its H100 Hopper GPU for text summarization using the GPT-J model.
Dave Salvator, Nvidia's Director of Accelerated Computing Products, expressed excitement over the performance gains achieved within six months. “We’ve improved performance significantly, thanks to our engineering team’s efforts to optimize the Hopper architecture,” he said.
Just last week at GTC, Nvidia announced the Blackwell GPU, the successor to the Hopper architecture. While the timeline for benchmarking Blackwell in MLPerf isn't confirmed, Salvator hopes it will occur soon.
Even before Blackwell's benchmarking, the MLPerf 4.0 results feature the new H200 GPU, which boasts up to 45% faster inference performance compared to the H100 when evaluated with Llama 2.
Intel Reinforces the Importance of CPUs in AI Inference
Intel actively participated in the MLPerf 4.0 benchmarks, showcasing both its Habana AI accelerator and Xeon CPU technologies.
Although the Gaudi performance results trail Nvidia's H100, Intel asserts that it offers superior price-to-performance ratios. More significantly, the new 5th Gen Intel Xeon processor demonstrates impressive gains for inference tasks.
During a press briefing, Ronak Shah, AI Product Director for Xeon at Intel, highlighted that the 5th Gen Xeon is 1.42 times faster for inference compared to the previous generation. Specifically, for the GPT-J LLM text summarization task, the 5th Gen Xeon achieved speeds up to 1.9 times faster.
“We understand that many enterprises require solutions that integrate general-purpose and AI capabilities,” Shah stated. “Our CPUs are designed to combine robust general-purpose processing with advanced AI performance through our AMX engine.”