Groq, a company that created custom hardware designed for running AI language models, is on a mission to deliver faster AI — 75 times faster than the average human can type to be precise.
Speed is very important with it comes to using AI. When you’re having a conversation with an AI chatbot you want that information to happen in real time. If you’re asking it to compose an email you want the results in seconds so that you can send it off and move on to the next task.
Groq (not to be confused with Elon Musk’s Grok chatbot — and no they aren’t too happy with the similar names) specializes in developing high-performance processors and software solutions for AI, machine learning (ML), and high-performance computing applications.
So while the Mountain View-based company (currently) doesn’t train its own AI language models it can make ones developed by others work really fast.
How does it achieve this?
Groq uses different hardware than its competition. And the hardware they use has been designed for the software they run, rather than the other way around.
They built chips that they’re calling language processing units (LPUs) that are designed for working with large language modes (LLMs). Other AI tools usually use graphics processing units (GPUs) which, as their name implies, are optimized for parallel graphics processing.
Even if they’re running chatbots, AI companies have been using GPUs because they can perform technical calculations quickly and are generally quite efficient. Building on the example of chatbots, LLMs such as GPT-3 (one of the models that ChatGPT uses) work by analyzing prompts and creating text for you based on a series of predictions about which subsequent word should follow the one that comes before it.
Groq is serving the fastest responses I've ever seen. We're talking almost 500 T/s!I did some research on how they're able to do it. Turns out they developed their own hardware that utilize LPUs instead of GPUs. Here's the skinny:Groq created a novel processing unit known as… pic.twitter.com/mgGK2YGeFpFebruary 19, 2024
Since Groq’s LMUs are specifically designed to deal with sequences of data (think DNA, music, code, natural language) they perform much better than GPUs. The company claims its users are already using its engine and API to run LLMs at speeds up to 10 times faster than GPU-based alternatives.
Try it out
You can try it out for yourself for free and without installing any software here using regular text prompts.
Groq currently runs Llama 2 (created by Meta), Mixtral-8x7b, and Mistral 7B.
Yes, we're working on it, but we're concentrating on building out our open source model offerings for now.February 19, 2024
On X, Tom Ellis, who works at Groq, said custom models are in the works but that they’re concentrating on building out their open source model offerings for now.