AMD's latest commits to the LLVM GitHub repository include mentions of a new "GFX950" GPU—likely the firm's recently announced Instinct MI355X or the now-launched MI325X accelerators, per Phoronix. These patches push early enablement for the accelerators in LLVM - to better optimize the compiler back-end and improve software compatibility.
LLVM is a collection of modular tools and libraries that optimize code for specific hardware and architectures. It acts as a language-agnostic intermediary between high-level languages and machine code. AMD's latest commits to the LLVM repository feature a new GFX950 GPU. Going by Team Red's nomenclature, Phoronix suggests that this is likely the internal codename for the MI325X or MI350 (MI355X) accelerators - though the latter is more likely since the MI325X has been available since October.
Looking into the commits, we find that AMD has added support for the "v_prng_b32 instruction," - offering hardware acceleration for random number generation and MFMA (Matrix Fused Multiply-Add) instructions for matrix-related operations in machine learning. Additionally, there are mentions of "V_CVT_F32_BF16" instructions to convert FP32 numbers to the BF16 format, and the LDS (Local Data Share) memory has been increased to 160kB.
It is great to see that AMD is prepping its accelerators for launch and we should hear more news in the coming months - possibly at CES 2025 - as the official launch window draws near. The MI355X from the MI350 family boasts 288GB of HBM3E memory - fabricated on TSMC's N3 node with support for FP4 and FP6 data types. AMD touts an 80% uplift as compared to the MI325X in FP16 and FP8 computations.
These chips will go neck and neck against Nvidia's Blackwell B300 chips by Q2 or Q3 next year. As it stands, the MI355X is expected to deliver 9.2 PetaFLOPS of FP4 compute performance - on par with Nvidia's B200 offerings. AMD is ahead in terms of memory capacity - featuring 288GB of HBM3E - presumably across eight 12-Hi stacks which is 50% more than the B200 but rumored to be on par with the B300. However, Blackwell's debut has been marred by purported overheating issues and a design flaw - which could push volume B200 supply to Q1/Q2 2025.