Huawei adds DeepSeek-optimized inference support for…

Huawei adds DeepSeek-optimized inference support for its Ascend AI GPUs

On January 27, the same day Nvidia’s stock price plummeted after the market fully appreciated what the Chinese LLM meant for the industry, China-based Huawei posted an article announcing that the distilled R1 AI model was then available for free via its ModelArts Studio platform. The tech company explicitly said this version is “Ascend-adapted,” as in Huawei’s Ascend data center GPUs.

Huawei doesn’t detail exactly what kinds of Ascend GPUs it uses for ModelArts Studio, particularly regarding the R1, but AI industry figures like Yuchen Jin believe that it may be the latest Ascend 910C. This new GPU was said to be sampled to customers in September, so it’s possible the 910C has already been added to Huawei’s cloud servers.

Although R1 was reportedly trained on over two thousand H800 GPUs from Nvidia, it’s significant for Huawei that the company’s GPUs have explicit support for actually running the LLM. This could cut out yet another part of the process where AI firms in China had to rely on Western companies, in this case, Nvidia and AMD, whose GPUs are sought out for both training and inference thanks to their high performance. However, Huawei may be catching up.

“Inference performance on Huawei 910C achieves 60% of the H100's performance from developers [sic] experience,” Jin said on X. “With hand-written CUNN kernels and optimizations, the performance is higher.” Jin also noted that the 910C could also be used for training, but the R1 was officially trained using H800 chips, though that doesn't mean DeepSeek will continue to use those H800s forever.

Performance is a significant problem for Nvidia in China, as Biden-era sanctions issued by the US government prevent companies from selling processors that are deemed too fast. Many of Nvidia’s best data center GPUs, like its H200 and B200, can’t be legally exported to China, forcing Nvidia to develop new models specifically for China that just barely meet the performance limit.

In fact, the H800, which DeepSeek claimed to use to train the R1 LLM, was launched after the Biden administration’s initial round of GPU export restrictions on China, in order to offer an alternative to the banned H100. However, the H800 and other Nvidia GPUs for the Chinese market were banned after the next round of sanctions, which lowered the performance cap of chips that could be legally sold in China.

Because of the US government’s export restrictions, Nvidia is forced to compete in China with weaker hardware; the chip company’s flagship for China, the H20, has much less memory, memory bandwidth, and TFLOPs than the H200, the top-end Hopper-based card.

This has apparently had a very real impact on Nvidia’s fortunes in China, and in May 2024, it was selling the H20 for less than Huawei’s Ascend 910B. However, H20 sales were apparently much better in the second half of last year, with its revenue growing by 50% in Q4 compared to Q3, after back-to-back quarters of healthy growth. Either way, Nvidia would certainly be in a better position against its Chinese competitors if it could sell its most powerful GPUs to China.

It’s not just about Nvidia being able to compete in China, though. Being able to run a Chinese LLM with cutting-edge performance on Chinese processors could be a major milestone for the country’s path to technological autarky. If the Ascend 910C or another Chinese GPU proves sufficient for training and inference, there will probably be even less need for processors like the H20. Of course, China isn’t quite ready to completely ditch Western chips until it progresses in chip manufacturing, but companies like Huawei are working on it.

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here