A Raspberry Pi 5 hooked up to an AMD Radeon-powered eGPU has been demonstrated using the graphics hardware to accelerate running a Large Language Model (LLM). Of course, it's Pi wizard Jeff Geerling again, and in the video embedded below, he talks us through his experience of leveraging the Vulkan API support to enjoy GPU-accelerated local AI on the Raspberry Pi 5.
In our last Raspberry Pi 5 connected to an eGPU progress report, we highlighted the modern AAA 4K gaming possibilities of this unlikely pairing. Games like Doom Eternal, Crysis Remastered, Red Dead Redemption 2, and Forza Horizon 4 were demoed running at 4K on our favorite $50 SBC. With most struggling to maintain performance above say 25fps, actual enjoyment of the titles would be another question.
Geerling ended his fun and informative video, last time, with an update on the Pi 5’s LLM support. He noted that he hadn’t managed to GPU accelerate any LLMs on the Pi 5, but smaller models could run on the CPU, in the Pi’s RAM. Moreover, with AMD basically ruling out ROCm support on Arm, prospects didn’t look good.
Thankfully, in the world of enthusiast-driven tech, things can change quickly. In his latest video, Geerling reveals the answer to GPU-accelerated LLMs on the Pi 5 is the Vulkan API (with an experimental patch). Vulkan can even outperform AMD's ROCm on hardware / systems that offer the choice between, notes Geerling, so it is by no means merely a poor man’s choice.
At around two minutes into the video, Geerling walks us through his hardware setup. The most esoteric thing here are the two boards used to hook up the GPU to the Pi. He used an adaptor to convert the Pi’s PCIe express FFC connector to an M.2 slot. Into the M.2 slot, he plugged an M.2 to OCuLink adaptor, with a cable to a GPU OCuLink riser. In the video, he uses an RX 6700 XT again (you’ll need a spare PC PSU too, among several other bits and pieces).
Software setup is currently a bit more involved, requiring the user to compile their own Linux kernel, collect together a handful of drivers and patches, and more. More guidance is available via Geerling's blog.
Casting more light onto the benefits of his hardware and software wrangling, the Pi enthusiast and TechTuber provides some performance figures and comparisons.
It is interesting to hear Geerling propose the Pi plus eGPu as an alternative which is almost as fast and efficient as an M1 Max Mac Studio (64GB). He also highlighted that the cost of the whole caboodle is about $700 new, but a lot cheaper if you already have some of the bits and pieces (especially for those with a spare old GPU).
Adding the RTX 4090 benchmark to the mix (second slide) shows how much LLM performance a powerful modern PC can muster. That’s great if you want a 600W system generating hundreds of tokens per second (T/s), but for home use offline AI then 40-60 T/s should be plenty. Moreover, whoever pays your energy bill might be pleased with the ~12W system idling power consumption of this efficient Pi-based (Pi 5 plus RX 6700 XT) solution.