Get all your news in one place.
100’s of premium titles.
One app.
Start reading
TechRadar
TechRadar
Wayne Williams

Intel's former CEO puts money into a little-known hardware startup that wants to make Nvidia obsolete

Fractile.

  • UK-based Fractile is backed by NATO and wants to build faster and cheaper in-memory AI compute
  • Nvidia's bruteforce GPU approach consumes too much power and is held back by memory
  • Fractile's numbers focused on a cluster of H100 GPU comparison, not the mainstream H200

Nvidia sits comfortably at the top of the AI hardware food chain, dominating the market with its high-performance GPUs and CUDA software stack, which have quickly become the default tools for training and running large AI models - but that dominance comes at a cost - namely, a growing target on its back.

Hyperscalers like Amazon, Google, Microsoft and Meta are pouring resources into developing their own custom silicon in an effort to reduce their dependence on Nvidia’s chips and cut costs. At the same time, a wave of AI hardware startups is trying to capitalize on rising demand for specialized accelerators, hoping to offer more efficient or affordable alternatives and, ultimately, to displace Nvidia.

You may not have heard of UK-based Fractile yet, but the startup, which claims its revolutionary approach to computing can run the world’s largest language models 100x faster and at 1/10th the cost of existing systems, has some pretty noteworthy backers, including NATO and the former CEO of Intel, Pat Gelsinger.

Removing every bottleneck

“We are building the hardware that will remove every bottleneck to the fastest possible inference of the largest transformer networks," Fractile says.

"This means the biggest LLMs in the world running faster than you can read, and a universe of completely new capabilities and possibilities for how we work that will be unlocked by near-instant inference of models with superhuman intelligence.”

Fractile’s performance numbers were originally based on comparisons with clusters of Nvidia H100 GPUs using 8-bit quantization and TensorRT-LLM, running Llama 2 70B - not the newer H200 chips. However, Fractile tells us its latest system architecture is projected to deliver 50x lower latency (tok/s/user) and over 10x the overall inferencing throughput per megawatt of the next generation Blackwell systems that Nvidia is shipping in the latter half of 2025.

In a LinkedIn posting, Gelsinger, who recently joined VC firm Playground Global as a General Partner, wrote, “Inference of frontier AI models is bottlenecked by hardware. Even before test-time compute scaling, cost and latency were huge challenges for large scale LLM deployments... To achieve our aspirations for AI, we will need radically faster, cheaper and much lower power inference.”

“I’m pleased to share that I’ve recently invested in Fractile, a UK-founded AI hardware company who are pursuing a path that’s radical enough to offer such a leap," he then revealed.

"Their in-memory compute approach to inference acceleration jointly tackles the two bottlenecks to scaling inference, overcoming both the memory bottleneck that holds back today’s GPUs, while decimating power consumption, the single biggest physical constraint we face over the next decade in scaling up data center capacity. In fact, some of the ideas I was exploring in my graduate work at Stanford University will now come to mainstream AI computing!”

You might also like

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.