Most companies creating AI models, especially generative models like ChatGPT and Stable Diffusion, rely heavily on GPUs. The parallel processing capabilities of GPUs make them ideal for training and deploying advanced AI systems. However, there is a significant GPU shortage.
Reports indicate that Nvidia's top AI cards are sold out until 2024. TSMC's CEO has voiced concerns that the shortage of AI GPUs—both from Nvidia and its competitors—may extend into 2025. In light of this, Microsoft is forging its path.
At the 2023 Ignite conference, Microsoft introduced its two custom-designed AI chips for data centers: the Azure Maia 100 AI Accelerator and the Azure Cobalt 100 CPU. The Maia 100 is engineered for training and executing AI models, while the Cobalt 100 is designed for general-purpose workloads.
“Microsoft is revolutionizing AI infrastructure to foster innovation. We are reengineering all facets of our data centers to better serve our customers,” stated Scott Guthrie, Executive Vice President of Microsoft’s Cloud and AI Group, in a press release. “Given our scale, optimizing and integrating every layer of our infrastructure stack is crucial to enhance performance, diversify our supply chain, and offer customers more choices.”
The Maia 100 and Cobalt 100 chips will begin rolling out to Azure data centers early next year, powering Microsoft’s AI services, including Copilot, Azure OpenAI Service, and a suite of generative AI products. While this is just the beginning, Microsoft confirms that second-generation Maia and Cobalt hardware are already in development.
Custom Chips for AI Innovation
Microsoft's decision to create custom AI chips is not entirely unexpected as plans have been in motion for some time. In April, The Information revealed that Microsoft had been developing AI chips quietly since 2019 under the code name "Athena." Furthermore, Bloomberg reported in 2020 that Microsoft had designed multiple chips based on ARM architecture for data centers and consumer devices, such as the Surface Pro.
The Ignite announcement reveals comprehensive details about Microsoft’s semiconductor initiatives.
First up is the Maia 100. This 5-nanometer chip houses 105 billion transistors and is crafted “specifically for the Azure hardware stack” to ensure peak hardware utilization. Microsoft anticipates that the Maia 100 will power significant internal AI workloads across platforms like Bing and Microsoft 365—though, for now, it's reserved for internal use.
Although the technical jargon may be dense, the essentials remain: Microsoft has yet to disclose specifics about the Maia 100’s architecture and hasn’t submitted it for public benchmarking, limiting comparisons with competitors like Google’s TPU or Amazon’s Tranium. However, it is promising that OpenAI provided input on Maia 100's design, continuing their collaborative relationship.
In 2020, Microsoft and OpenAI together developed an Azure-hosted “AI supercomputer,” utilizing over 285,000 processor cores and 10,000 GPUs for powerful computational needs. As Altman of OpenAI commented, their partnership has evolved to optimize Azure’s AI infrastructure, refining the Maia chip to advance their model training initiatives.
Microsoft also shared details about the Cobalt 100, which is built for energy efficiency and features a 128-core architecture optimized for cloud-native services.
The Rationale Behind Custom AI Chips
The question arises: why create its own AI chips? On the surface, it’s about “optimizing every layer of the Azure technology stack.” Beneath this, Microsoft aims to maintain competitiveness and keep costs manageable in the relentless AI arms race.
The GPU marketplace is in short supply, making companies, including Microsoft, increasingly reliant on chip vendors. Nvidia recently reached a valuation exceeding $1 trillion, becoming the sixth tech firm in history to achieve this milestone, driven by AI-related revenues. Conversely, AMD anticipates its GPU data center revenue will exceed $2 billion in 2024 even with a smaller market presence.
Microsoft is seeking to alleviate its dependence on external vendors. OpenAI's CEO already noted that GPU shortages were impacting progress, leading to limitations with ChatGPT's sign-ups due to capacity constraints. Furthermore, he indicated that increased investment from Microsoft is essential to support impending model training costs.
In a bid to secure adequate supplies, Microsoft has been incenting Azure customers to relinquish unused GPU reservations, pledging significant investments to third-party GPU providers. If OpenAI moves to develop its AI chips, it may shift the dynamics of their partnership; however, Microsoft believes the potential cost savings from in-house hardware justifies this risk.
The operational costs associated with AI products such as GitHub Copilot emphasize the urgency for Microsoft to streamline expenses, with estimates indicating up to $80 per user per month due to model inference costs. Financial analysts predict Microsoft may struggle to generate AI revenues next year if conditions do not improve.
While the venture into hardware manufacturing poses challenges—evidenced by Meta’s struggles with its AI chips—Microsoft remains determined and optimistic. As Pat Stemen of Microsoft’s Azure hardware systems team affirmed, “Our innovation is extending down into the silicon stack to ensure optimized performance and cost for our customers’ workloads.”
In summary, Microsoft is committed to creating custom AI chips aimed at enhancing efficiency and enabling continued AI innovation in its cloud services. As they embark on this ambitious path, the industry will be watching closely to see how these developments unfold.