Nvidia has unveiled its most powerful systems to date with the launch of the DGX SuperPod at the Nvidia GTC conference. This cutting-edge system is part of a comprehensive hardware and software rollout.
In recent years, the DGX has become a cornerstone of Nvidia's server and cloud offerings. The new DGX SuperPod is equipped with Nvidia's next-generation GPUs for AI acceleration, known as Blackwell, which is being unveiled as the successor to the Hopper GPU. Blackwell is designed to support AI models with a trillion parameters.
What is the DGX SuperPod?
The DGX SuperPod isn't just a single server; it’s a robust configuration of multiple DGX GB200 systems. Each system comprises 36 Nvidia GB200 Superchips, integrating 36 Nvidia Grace CPUs and 72 Nvidia Blackwell GPUs, all connected via fifth-generation Nvidia NVLink. This supercomputing platform can scale to include eight or more DGX GB200 systems, linking tens of thousands of GB200 Superchips through Nvidia Quantum InfiniBand.
The system boasts an impressive 240 terabytes of memory, critical for training large language models (LLMs) and carrying out generative AI inference at scale. Additionally, the DGX SuperPod delivers a staggering 11.5 exaflops of AI supercomputing power.
Advanced Networking and Processing
A key feature of the DGX SuperPod is its unified compute fabric, facilitated by the newly introduced Nvidia Quantum-X800 InfiniBand networking technology, which offers up to 1,800 gigabytes per second of bandwidth to each GPU. The system also integrates Nvidia BlueField-3 Data Processing Units (DPUs) along with fifth-generation Nvidia NVLink.
Furthermore, the DGX SuperPod incorporates fourth-generation Nvidia Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology, which delivers 14.4 teraflops of in-network computing, representing a fourfold increase over its predecessor.
Blackwell in the Nvidia DGX Cloud
The GB200-based DGX systems will soon be available through Nvidia's DGX Cloud service, initially accessible on major platforms like Amazon Web Services (AWS), Google Cloud, and Oracle Cloud.
According to Ian Buck, VP of Hyperscale and HPC at Nvidia, "DGX Cloud is designed in partnership with our cloud partners to deliver the best Nvidia technology for our AI research and to our customers." The new GB200 architecture will also enhance the Project Ceiba supercomputer, which Nvidia is developing with AWS, aimed at creating the world's largest public cloud supercomputing platform.
Buck announced an exciting development: "Project Ceiba has evolved, now upgraded to Grace Blackwell architecture supporting 20,000 GPUs, enabling over 400 exaflops of AI."
These advancements position Nvidia at the forefront of AI technology, making the DGX SuperPod a remarkable tool for industries involved in AI research and applications.