AI accelerators such as Nvidia's H100 can cost up to $30,000, a price point that may be prohibitive for many startups and SMBs. Inference.ai offers organizations access to GPU computing resources in the cloud, allowing them to carry out tasks like deep learning and data analytics without the need for physical hardware.
By collaborating with global data centers, Inference.ai can provide a variety of GPUs from manufacturers including Nvidia, AMD, and Intel. This is a cost-effective solution for companies that cannot afford to buy the necessary hardware outright. This service is also beneficial for those seeking access to future, high-demand GPU architectures like Nvidia’s Blackwell platform.
John Yue, the CEO and Co-Founder of Inference.ai, has more than ten years of experience in the hardware and infrastructure industry. He is also the founder of Bifrost Cloud, a secure data storage solution. We asked him how Inference.ai began, and the company's future plans.
- What is Inference.ai offering and why stick to GPU for now?
Inference.ai provides computing power for both model training and inference as a service. We partner with hundreds of data centers around the world to offer a vast and diverse fleet of GPUs for infrastructure that suit all kinds of inference and model training needs. From the classic NVIDIA options to the new range of inference-specific ASICs being released soon, we aim to help users build the most ideal infrastructure setup with our abundant compute resources.
We’ve built our business around GPU services to meet the escalating demand for chips amidst an ongoing global shortage. In 2023, we all witnessed the frenzy of training AI models leaving companies scavenging for dedicated compute resources on GPUs. And now that we’re well into 2024, we’ve recognized a major shift in the AI landscape: the application of inferencing will rise to center stage as AI applications increasingly find their market niche. Inference.ai provides a solution for AI companies seeking GPUs to train their models AND deploy them for inferencing after.
- What is Inference.ai’s origin story?
Four years ago, my co-founder Michael Yu and I decided to place our bets that accelerated computing and cost-effective data storage would be the ground pillars powering the next decade of innovations. We both studied Computer Science at the University of Waterloo and shared a decade of teamwork in hardware, computing, and infrastructure. Our bets paid off, and together, we launched a global distributed IaaS data storage company. From there, we evolved to meet the rising need for computing power with the AI boom – and Inference.ai was born.
Our experience from running a data storage solution and knowledge of industry trends enabled us to easily pivot and adapt to the growing need for GPU. Further, the connections we made with our data centers opened the doors to a network of GPU suppliers that we partner with today. Realizing that our customers were hungry for more than just cloud data storage, it was a natural addition to expand into the GPU space – and Inference.ai was born.
- How is your offering different from competitors (big or small)?
Most of our competitors lack a sizable and diverse GPU fleet to effectively accommodate the surging demand for model training and inferencing. We have access to over 15+ different NVIDIA GPU SKUs and the newest releases before our competitors.
Additionally, with data centers distributed globally, we can ensure low-latency access to computing resources from anywhere in the world. This is crucial for applications requiring real-time processing or collaboration across different geographic locations.
And importantly, we offer the most competitive pricing on the market – 82% cheaper than hyperscalers (Microsoft, Google, and AWS).
- Can you share a story that illustrates how Inference.ai stands out?
One of our first customers, a startup in Seattle, was looking for specialized L40S chips that were directly coming from NVIDIA and only being sold to research labs. The startup called everyone else on the market and nobody had them. When they called us, we were able to pull some strings through our wide network and acquire the exact chips they needed. With our help, they were able to launch ChatDesigner, the first AI image generator dedicated to combining the power of large language models with image generation. The product has been wildly successful so far – within just a few months of launch, it was voted #3 Product of the Day and #4 Design Tool of the Week on Product Hunt.
- Why target startups specifically - and not the wider market?
Inference.ai is not only for startups, but we’re naturally positioned to support them. Most startups don’t have access to Meta’s pocketbook to buy 350,000 H100s, and for the many early-stage startups out there, building credit with a company name that nobody knows yet is extremely difficult. We’re dedicated to providing the computing power to AI visionaries for the best price from our large fleet of GPUs. For startups who can’t afford to wait for GPUs from the big three cloud providers, Inference.ai provides the support they need.
We also have customers who are looking for specific chipsets and those who are struggling to acquire the computing power they need through the big three cloud providers. Inference.ai’s vast and diverse fleet is here to power their ideas and scale their business.
- How do you see the future of inference and training evolve in the next few years?
In 2024, we expect AI products to increasingly find their product market fit and shift from model training to inferencing – where trained AI models deliver value to users based on new, unseen data. Forward-thinking companies and developers will need to acquire GPUs timely and economically to meet their inferencing demands – something they’re barely able to do now just to meet their training needs.
- What have you been up to since launch?
Since Inference.ai launched in January, things have been going fantastic. We have a strong (and rapidly growing) roster of customers in the AI space who see real value in the services we provide. Our talented team is growing and we recently moved to a new office space in Palo Alto in the heart of Silicon Valley. We have started our go-to-market outreach and launched a billboard on 101N in San Francisco to increase brand awareness.
We’re also showing strong momentum on the product front. Last month, we released ChatGPU - our proprietary chatbot that helps decision makers through the discovery process and understand which GPUs will best power their compute infrastructure. We just unveiled a striking billboard on 101N in San Francisco to spread our message into the center of the AI revolution.
Read more from TechRadar Pro
- We tested the best AI tools you can get right now
- See our round-up of the best cloud storage
- AI-powered cyberattacks are the biggest concern for IT - and most are unprepared for them