Musk's xAI Moves from Oracle Cloud to Create a Huge GPU Cluster for Grok 3 Development

Elon Musk's xAI startup has decided not to extend its partnership with Oracle for cloud services aimed at future AI training workloads. According to reports, discussions between the two companies concluded after Oracle acknowledged it could not meet xAI’s ambitious timeline. Musk confirmed this decision on X (formerly Twitter), emphasizing that their success hinges on speed and control: “We must have our own hands on the steering wheel, rather than be a backseat driver.”

Previously, xAI utilized Oracle Cloud servers to train its Grok and Grok 1.5 language models. However, the startup is now in the process of constructing its own training cluster alongside a state-of-the-art supercomputer in Memphis to enhance its AI training capabilities. xAI is set to launch Grok 2 in August and is already planning Grok 3. Musk believes that training Grok 3 effectively will require an extraordinary number of resources—specifically, up to 100,000 H100 GPUs from Nvidia.

As one of Oracle's largest cloud service customers, xAI utilized approximately 16,000 Nvidia GPUs for its model training. Yet, the increasing demand for hardware led to tensions; Oracle's founder and Chief Technology Officer, Larry Ellison, remarked during the company’s Q2 earnings call that xAI “wants a lot more GPUs than we gave them,” adding that Oracle was working to accommodate those heightened requests.

Musk's vision for Grok 3 is ambitious, as he aims for it to compete with or even surpass GPT-5—a much-anticipated upcoming release from OpenAI that promises a significant leap in language processing capabilities. In just a year, xAI has reached a remarkable valuation of $24 billion, but Musk is committed to accelerating the development pace to challenge existing AI leaders.

“We decided to pursue the 100k H100 and our next major internal system because our competitiveness fundamentally relies on being faster than any other AI company. This is the only way to catch up,” Musk stated on X. While he acknowledged Oracle as a great company, he has shifted his focus towards Nvidia, Dell, and Supermicro to establish what he believes will be the most powerful AI training infrastructure in the world.

Musk detailed that their upcoming liquid-cooled system is slated to begin training later this month as part of a grand plan to create what xAI intends to be the largest supercomputer globally, thereby solidifying its position in the rapidly evolving AI landscape.

Most people like

Find AI tools in YBX