Tech baron Elon Musk has taken to Twitter/X to boast of starting up “the most powerful AI training cluster in the world,” which he will use to create the self-professed "world’s most powerful AI by every metric by December of this year.” Today, xAI’s Memphis Supercluster began AI training using 100,000 liquid-cooled Nvidia H100 GPUs connected with a single RDMA (remote direct memory access) fabric.
Nice work by @xAI team, @X team, @Nvidia & supporting companies getting Memphis Supercluster training started at ~4:20am local time.With 100k liquid-cooled H100s on a single RDMA fabric, it’s the most powerful AI training cluster in the world!July 22, 2024
Whether Musk personally flicked the switch to start up the supercluster seems unlikely, as it is noted that it commenced its gargantuan task at 4.20am CDT, but as you can see below, he did help out the fiber tech guy.
In May, we reported on Musk’s ambition to open the Gigafactory of Compute by Fall 2025. At the time, Musk hurried to begin work on the supercluster, necessitating the purchase of current-gen ‘Hopper’ H100 GPUs. It appeared to signal that the tech tycoon didn’t have the patience to wait for H200 chips to roll out, not to mention the upcoming Blackwell-based B100 and B200 GPUs. This is despite the expectation that the newer Nvidia Blackwell data center GPUs would ship before the end of 2024.
Come help xAI route photons as a elite fiber tech in Memphis! pic.twitter.com/JJShV75MayJuly 15, 2024
So, if the Gigafactory of Compute was touted for opening by Fall 2025, does today’s news mean the project has come to fruition a year early? It could indeed be early, but it seems more likely that the sources talking to Reuters and The Information earlier this year misspoke or were misquoted regarding the timing of the project. Also, with the xAI Memphis Supercluster already up and running, the questions about why xAI did not wait for more powerful or next-gen GPUs are answered.
Glad to be making history with @elonmusk, such a great experience to work with his Memphis team! To meet the target, our execution had to be as perfect as possible, as quick as possible, as efficient as possible and as environmentally friendly as possible – lots of hard work, but…July 22, 2024
Supermicro provided much of the hardware, and the company's CEO, Charles Liang, also commented on Musk's thread, touting the team's execution. This follows Liang's recent glowing words for Musk's liquid-cooled AI data centers.
In a follow-up Tweet, Musk explains that the new supercluster will be “training the world’s most powerful AI by every metric.” From previous statements of intent, we assume that the power of xAI’s 100,000 H100 GPU installation will now be targeted at Grok 3 training. Musk said the refined LLM should be finished with the training stage “by December this year.”
To put the Memphis Supercluster compute resources in some context, certainly, going by scale, the new xAI Memphis Supercluster easily outclasses anything in the most recent Top500 list in terms of GPU horsepower. The world’s most powerful supercomputers such as Frontier (37,888 AMD GPUs), Aurora (60,000 Intel GPUs), and Microsoft Eagle (14,400 Nvidia H100 GPUs) seem to be significantly outgunned by the xAI machine.