Nvidia has launched its next-generation Blackwell graphics processing units (GPUs), boasting 25 times better energy efficiency and reduced costs for AI processing tasks.
The new Nvidia GB200 Grace Blackwell Superchip combines multiple chips in a single package, promising up to a 30 times performance improvement for large language model (LLM) inference workloads compared to previous models. During a keynote presentation at Nvidia GTC 2024, CEO Jensen Huang highlighted Blackwell as a pivotal advancement in computing, with plans for gaming products to follow.
Huang humorously noted that the prototypes he showcased were valued at $10 billion and $5 billion, underscoring the significance of the Grace Blackwell system. “For three decades, we’ve pursued accelerated computing to enable breakthroughs in deep learning and AI,” he said. “Generative AI is shaping our era, and Blackwell GPUs will drive this industrial revolution across all sectors.”
Nvidia asserts that Blackwell-based systems will allow organizations to deploy real-time generative AI on trillion-parameter models at 25 times less cost and energy consumption compared to the Hopper architecture. The processing capabilities will scale to models with up to 10 trillion parameters.
As Nvidia looks to maintain its competitive edge against companies like Groq—focused on inference chips—and high-end CPU competitors like Cerebras, AMD, and Intel, Blackwell’s advancements present significant cost and energy efficiencies over its predecessor.
Named after mathematician David Harold Blackwell, the first Black scholar inducted into the National Academy of Sciences, the Blackwell platform succeeds Nvidia’s Hopper architecture, setting new benchmarks in accelerated computing. Originally designed for gaming graphics, GPUs have become the backbone of AI processing, propelling Nvidia's market capitalization to $2.2 trillion and attracting media attention at events like Nvidia GTC.
The platform introduces six innovative technologies that could transform various fields, including data processing, engineering simulations, electronic design automation, computer-aided drug design, quantum computing, and generative AI.
Huang claimed Blackwell will emerge as the world’s most powerful chip, featuring 208 billion transistors manufactured using TSMC’s advanced 4NP process for enhanced processing capabilities. The second-generation transformer engine includes micro-tensor scaling support and advanced dynamic range management, doubling compute capacity while introducing innovative 4-bit floating point AI inference capabilities.
Nvidia also launched its fifth-generation NVLink networking technology, enabling high throughput for multitrillion-parameter AI models. The latest NVLink iteration provides 1.8 TB/s bidirectional throughput per GPU, facilitating seamless communication among up to 576 GPUs for complex LLMs. Additionally, the RAS Engine integrated into Blackwell GPUs enhances system reliability and reduces operating costs through AI-based maintenance.
The Blackwell architecture will be integral to major server systems. With advanced confidential computing capabilities, it protects AI models and customer data while maintaining high performance—crucial for privacy-sensitive industries. The dedicated decompression engine accelerates database queries, boosting data analytics and processing performance.
The Nvidia GB200 NVL72, a rack-scale system delivering 1.4 exaflops of AI performance and 30TB of fast memory, is built around the Blackwell superchip. Major cloud providers and AI leaders, including Amazon, Google, Meta, Microsoft, and OpenAI, are expected to adopt this platform, indicating a major shift in computational capabilities.
The GB200 Grace Blackwell Superchip connects two Nvidia B200 Tensor Core GPUs to the Nvidia Grace CPU through a 900GB/s ultra-low-power link, achieving a performance increase of up to 30 times over the Nvidia H100 Tensor Core GPU for LLM inference while cutting costs and energy consumption by up to 25 times.
The GB200 is a crucial component of the multi-node, liquid-cooled NVL72 system that combines 36 Grace Blackwell Superchips, featuring 72 Blackwell GPUs and 36 Grace CPUs interconnected via fifth-generation NVLink. Additionally, the system integrates Nvidia BlueField-3 data processing units to enhance cloud networking, storage security, and GPU compute flexibility for hyperscale AI applications.
Nvidia’s HGX B200 server board interconnects eight B200 GPUs to support leading x86-based generative AI platforms, offering networking speeds up to 400Gb/s through Nvidia’s Quantum-2 InfiniBand and Spectrum-X Ethernet technologies.
The GB200 will also be available on Nvidia DGX Cloud, an AI platform co-developed with major cloud service providers, providing developers with essential tools for building advanced generative AI models. Companies like Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo, and Supermicro, along with several others, are expected to deliver a variety of servers based on Blackwell technology.