
We’ve shown previously that you can run ChatGPT on a Raspberry Pi, but the catch is that the Pi is just providing the client side and then sending all your prompts to someone else’s powerful server in the cloud. However, it’s possible to create a similar AI chatbot experience that runs locally on an 8GB Raspberry Pi and uses the same kind of LLaMA language models that power AI on Facebook and other services.
The heart of this project is Georgi Gerganov’s llama.cpp. Written in an evening, this C/C++ model is fast enough for general use, and is easy to install. It runs on Mac and Linux machines and, in this how to, I’ll tweak Gerganov’s installation process so that the models can be run on a Raspberry Pi 4. If you want a faster chatbot and have a computer with an RTX 3000 series or faster GPU, check out our article on how to run a ChatGPT-like bot on your PC.
Managing Expectations
Before you head into this project, I need to manage your expectations. LLaMA on the Raspberry Pi 4 is slow. Loading a chat prompt can take minutes, and responses to questions can take just as long. If speed is what you crave, use a Linux desktop / laptop. This is more of a fun project, than a mission critical use case.
For This Project You Will Need
- Raspberry Pi 4 8GB
- PC with 16GB of RAM running Linux
- 16GB or larger USB drive formatted as NTFS
Setting Up LLaMA 7B Models Using A Linux PC
The first section of the process is to set up llama.cpp on a Linux PC, download the LLaMA 7B models, convert them and then copy them to a USB drive. We need the Linux PC’s extra power to convert the model as the 8GB of RAM in a Raspberry Pi is not enough.
1. On your Linux PC open a terminal and ensure that git is installed.
sudo apt update && sudo apt install git2. Use git to clone the repository.
git clone https://github.com/ggerganov/llama.cpp3. Install a series of Python modules. These modules will work with the model to create a chat bot.
python3 -m pip install torch numpy sentencepiece4. Ensure that you have g++ and build essential installed. These are needed to build C applications.
sudo apt install g++ build-essential5. In the terminal change directory to llama.cpp.
cd llama.cpp6. Build the project files. Press Enter to run.
make7. Download the Llama 7B torrent using this link. I used qBittorrent to download the model.
magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA8. Refine the download so that just 7B and tokenizer files are downloaded. The other folders contain larger models which weigh in at hundreds of gigabytes in size.

9. Copy 7B and the tokenizer files to /llama.cpp/models/.
10. Open a terminal and go to the llama.cpp folder. This should be in your home directory.
cd llama.cpp11. Convert the 7B model to ggml FP16 format. Depending on your PC, this can take a while. This step alone is why we need 16GB of RAM. It loads the entire 13GB models/7B/consolidated.00.pth file into RAM as a pytorch model. Trying this step on an 8GB Raspberry Pi 4 will cause an illegal instruction error.
python3 convert-pth-to-ggml.py models/7B/ 112. Quantize the model to 4-bits. This will reduce the size of the model.
python3 quantize.py 7B13. Copy the contents of /models/ to the USB drive.
Running LLaMA on Raspberry Pi 4

In this final section I repeat the llama.cpp setup on the Raspberry Pi 4, then copy the models across using a USB drive. Then I load an interactive chat session and ask “Bob” a series of questions. Just don’t ask it to write any Python code. Step 9 in this process can be run on the Raspberry Pi 4 or on the Linux PC.
1. Boot your Raspberry Pi 4 to the desktop.
2. Open a terminal and ensure that git is installed.
sudo apt update && sudo apt install git3. Use git to clone the repository.
git clone https://github.com/ggerganov/llama.cpp4. Install a series of Python modules. These modules will work with the model to create a chat bot.
python3 -m pip install torch numpy sentencepiece5. Ensure that you have g++ and build essential installed. These are needed to build C applications.
sudo apt install g++ build-essential6. In the terminal, change directory to llama.cpp.
cd llama.cpp7. Build the project files. Press Enter to run.
make8. Insert the USB drive and copy the files to /models/ This will overwrite any files in the models directory.
9. Start an interactive chat session with “Bob”. Here is where a little patience is required. Even though the 7B model is lighter than other models, it is still a rather weighty model for the Raspberry Pi to digest. Loading the model can take a few minutes.
./chat.sh10. Ask Bob a question and press Enter. I asked it to tell me about Jean-Luc Picard from Star Trek: The Next Generation. To exit press CTRL + C.

 
         
       
         
       
         
       
         
       
       
       
       
       
       
    