The rising demand for generative AI, which primarily relies on GPUs, has led to a significant shortage of these essential components. Reports indicate that Nvidia’s top-performing GPUs are entirely sold out until 2024. TSMC's CEO has expressed concerns, suggesting that the GPU deficiency from Nvidia and its competitors could extend into 2025.
To reduce their dependence on GPUs, tech giants with the resources to invest are now creating and offering custom chips designed specifically for developing, refining, and deploying AI models. Amazon is one such company; during its annual AWS re:Invent conference, it unveiled the latest generation of its dedicated chips for model training and inference.
The first announcement was for AWS Trainium2, which promises up to four times the performance and double the energy efficiency compared to its predecessor, Trainium, which was introduced in December 2020. Available in EC Trn2 instances within AWS, this chip can be clustered with up to 16 units, scaling to an impressive 100,000 chips in AWS’s EC2 UltraCluster product.
Amazon states that a cluster of 100,000 Trainium chips can deliver an extraordinary 65 exaflops of computing power, equating to 650 teraflops per chip. While this calculation may have complexities that could affect accuracy, if a single Trainium2 chip provides approximately 200 teraflops, it significantly surpasses the capabilities of Google’s custom AI training chips from 2017.
Notably, a cluster of 100,000 Trainium chips can train a 300 billion parameter large language model within weeks, as opposed to months. This parameter size is approximately 1.75 times larger than OpenAI's GPT-3, which is the predecessor to the renowned GPT-4.
"Silicon is fundamental to every customer workload, making it a key area for innovation at AWS," stated David Brown, AWS's VP of compute and networking, in a press release. "With the surge in interest surrounding generative AI, Trainium2 will enable customers to train their machine learning models more swiftly, cost-effectively, and efficiently."
While Amazon has not specified when Trainium2 instances will be available to AWS customers, it has indicated that they will be released "sometime next year." We will continue to monitor and provide updates as more details emerge.
The second chip unveiled was the Arm-based Graviton4, designed specifically for inference tasks. As the fourth iteration in Amazon's Graviton lineup, Graviton4 differentiates itself from the existing Inferentia chip designed for similar tasks.
Amazon boasts that Graviton4 offers up to 30% improved compute performance, 50% more cores, and a 75% increase in memory bandwidth compared to the previous generation, Graviton3 (excluding the newer Graviton3E) when operating on Amazon EC2. Graviton4 also features all hardware interfaces encrypted, providing enhanced security for AI training workloads and data, particularly for customers with stringent encryption needs. We have reached out to Amazon for further clarification on what “encrypted” entails and will update our findings accordingly.
"Graviton4 represents our fourth generation of chips released in just five years, and it is the most powerful and energy-efficient chip we've developed for diverse workloads," Brown added. "By honing our chip designs to address real customer demands, we can offer the most advanced cloud infrastructure available."
Graviton4 is currently available in preview within Amazon EC2 R8g instances, with general availability expected in the coming months.