Hello and welcome to Eye on AI. In this edition…What both the true believers and the doubters get wrong about today's AI...Nvidia wants to dominate ‘embodied AI’ just as it has datacenter-based AI...The innovations behind DeepSeek V3's impressive performance...Can AI give the little guy a leg up?
Among AI industry insiders, opinion about AI progress tends to bifurcate. In one camp are folks such as Sam Altman. The OpenAI CEO wrote a blog post on his personal website over the weekend reflecting on OpenAI’s trajectory, especially over the past two years. In the post, Altman stated that artificial general intelligence—which OpenAI defines as a single AI software system that can perform as well or better than people at most economically-useful cognitive tasks—was essentially a solved problem. “We are now confident that we know how to build AGI as we have traditionally understood it,” he wrote. And Altman predicted that in 2025 “we may see the first AI agents ‘join the workforce’ and materially change the output of companies.”
The other camp is deeply skeptical of the value of today’s AI software. Gary Marcus, the AI authority and emeritus NYU professor of cognitive psychology, may be the best example of these doubters. Marcus has recently used his blog to point out several key challenges facing today’s AI that he doesn’t think will be solved anytime soon. These include common sense reasoning, but also AI’s continued unreliability, its inability to generalize to data different from what it encountered during training, and its difficulty with understanding compositionality (how parts constitute a whole). Coupled with the high cost of the most advanced AI systems, Marcus has often wondered whether AI will ever find much use in business outside a few niche settings.
But several conversations I had in the closing weeks of last year with business executives who are using AI at scale at their companies made it clear to me that, when it comes to business applications of AI, we should neither be as optimistic as Altman, nor as pessimistic as Marcus. The business executives I spoke to all reported finding ways to wring significant business value from today’s AI software, despite the shortcomings Marcus has highlighted.
But how they did this was far more complicated and involved far more human engineering work—and, often, a good deal more expense—than what one might think from Altman’s pronouncements about how close AGI is. In no case could one simply use one of the foundation models straight out of the box and have it reliably solve business problems.
Prosus’s Toqan System
One of the people I spoke to was Euro Beinat, executive vice president and global head of AI and data at Prosus, the Netherlands-based technology investment firm whose portfolio includes dozens of tech startups worldwide. I like talking to Beinat because Prosus’s diverse portfolio gives him a good vantage point from which to gauge how AI is being adopted across different kinds of companies—from food delivery apps to ecommerce plays to fintechs—and across different job functions. I also like speaking to him because he is unusually candid about what has worked, and what hasn’t.
Beinat said that in the past year, Prosus has rolled out its Toqan AI system to 25,000 employees across its various portfolio companies. Employees can use Toqan to do everything from answering questions about employment policies to drafting marketing surveys to assisting human customer support agents in finding the right documentation to answer customer queries. One of Prosus’s companies, OLX Poland, has created an agentic AI system, called OLX Magic, that helps walk sellers through the process of posting a listing or, in the case of a potential buyer, helps them shop, letting them specify what they are looking for in natural language and have a “conversation” about the options with an AI chatbot, rather than using a traditional search.
Using multiple models and an “agentic workflow”
Of course, one of the things that has held AI adoption in business back, as Marcus rightly points out, is reliability. Few business use cases can tolerate the 10% to 25% level of inaccuracies many large language models (LLMs) generate if used without any other interventions. Through an iterative process of improvement—including building better guardrails and updating the AI models it was using—Prosus gradually brought Toqan’s hallucination rate down from 10% in 2022 to 2.5%. But to get it down further, Beinat says, Prosus had to change how the entire system is engineered to build a more “agentic workflow.”
That process involves having an AI model that reasons about the nature of the question it's being asked and decides whether the question can be given to an LLM (large language model) to answer directly, or whether it requires the agentic workflow. If it does, the model breaks the task into discrete parts and gives different AI “agents” (either models that have been fine-tuned for a specific task or LLMs that have been prompted to play a particular role and perhaps given a specific software tool to use to help complete that task) each part. Then there is a “reflection phase,” where an AI model checks the overall result of this workflow for errors, repeating the entire process if any are found. Using this system, Process has reduced hallucinations to 1.5%.
But, Beinat warns, “it is slower to do this and a lot more expensive in terms of token usage” than simply giving the question straight to an LLM and having it answer. Overall, the number of tokens used per query has increased by 2.5 times. Meanwhile, the average price per token has, thanks to price wars among cloud providers, slightly more than halved. So, on average, the system is only about 10% more expensive today than it was in early 2023.
Measuring ROI
The lower hallucination rate is probably worth the cost, he says. When Toqan was initially rolled out, it was embraced mostly by engineers, while people in other domains, such as human resources and legal, were reluctant to use it. Beinat says he thinks this was because engineers, due to the nature of their work, often had an intuitive sense of when they could trust the model’s output, whereas in other areas, detecting hallucinations was more difficult and the chance of errors made people hesitant to use Toqan. Now, with the lower hallucination rate, the majority of Toqan users are from non-engineering roles. Still, Beinat warns, managers should not expect AI’s impacts to be apparent immediately after a system is introduced. Prosus has found that on average it takes six months of learning and experimentation for users to figure out how to use these new AI tools most effectively in their particular role, he says.
And, even then, Beinat acknowledges figuring out the return on investment from AI is difficult. So far, he says, Prosus data shows that Toqan saves about 48 minutes on average per user per day. That’s not nothing, but he says the problem is that those 48 minutes “are spread all over the place. There are all these microbursts of productivity.” And the value of those saved minutes varies a lot depending on the use case. Prosus has calculated that right now, the cost of those 48 saved minutes per day, is about $12 per user per month, which he says is definitely worth it.
Reducing the cost of growth
Still, 48 minutes each day doesn’t seem like a game changer. And that’s why he says he often likes to highlight individual use cases, where AI’s transformative impact is more apparent. He points to iFood, a Brazil-based food delivery app Prosus owns. iFood told its employees that if they had a data analytics questions to try asking that question to Toqan before sending it on to a human data analyst. The company discovered that 70% of these questions could be solved by Toqan. iFood still employs plenty of data analysts who handle the question Toqan can’t, but now their backlog has been reduced and the capacity of those human data analysts is less of a bottleneck. And, of course, savings such as this mean that iFood can grow without hiring as many new employees—in essence, AI reduces the cost per dollar of revenue generated.
It’s this kind of business logic that is too often lost in Marcus’s pessimistic takes on today’s AI—while the significant effort it takes to deliver that cost reduction is often glossed over in Altman’s rosy statements about AGI being solved.
With that, here’s more AI news.
Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn
***
Before we get to the news, at the Fortune Brainstorm Tech dinner last night at CES in Las Vegas, legendary investor Mark Cuban treated the audience to his pearls of wisdom on everything from shaking up how pharmaceuticals are sold in America to the impact AI is having on his companies and everything else. You can check out the video of his talk on Fortune’s website here.
***
Also, a correction: Last Thursday’s (Jan. 2, 2025) edition of the newsletter on corporate cybersecurity training erroneously stated that none of the training courses reviewed addressed the emergence of deepfakes in live video calls. One training video, from Ninjio, did address this threat. We regret the error.