Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Fortune
Fortune
Jeremy Kahn

If your mother says she love you

Photo of Microsoft CEO Satya Nadella. (Credit: SeongJoon Cho—Bloomberg via Getty Images)

City News Bureau of Chicago, a now-defunct news outfit once legendary as a training ground for tough-as-nails, shoe-leather reporters, famously had as its unofficial motto: “If your mother says she loves you, check it out.” Thanks to the advent of ChatGPT, the new Bing Search, Bard, and a host of copycat search chatbots based on large language models, we are all going to have to start living by City News’ old shibboleth.

Researchers already knew that large language models were imperfect engines for search queries, or any fact-based request really, because of their tendency to make stuff up (a phenomenon A.I. researchers call “hallucination”). But the world’s largest technology companies have decided that the appeal of dialogue as a user interface—and the ability of these large language models to perform a vast array of natural language-based tasks, from translation to summarization, along with the potential to couple these models with access to other software tools that will enable them to perform tasks (whether it is running a search or booking you theater tickets)—trumps the potential downsides of inaccuracy and misinformation.

Except, of course, there can be real victims when these systems hallucinate—or even when they don’t, but merely pick up something that is factually wrong from their training data. Stack Overflow had to ban users from submitting answers to coding questions that were produced using ChatGPT after the site was flooded with code that looked plausible but was incorrect. The science fiction magazine Clarkesworld had to stop taking submissions because so many people were submitting stories crafted not by their own creative genius, but by ChatGPT. Now a German company called OpenCage—which offers an application programming interface that does geocoding, converting physical addresses into latitude and longitude coordinates that can be placed on a map—has said it has been dealing with a growing number of disappointed users who have signed up for its service because ChatGPT erroneously recommended its API as a way to look up the location of a mobile phone based solely on the number. ChatGPT even helpfully wrote python code for users allowing them to call on OpenCage’s API for this purpose.

But, as OpenCage was forced to explain in a blog post, this is not a service it offers, nor one that is even feasible using the company’s technology. OpenCage says that ChatGPT seems to have developed this erroneous belief because it picked up on YouTube tutorials in which people also wrongly claimed OpenCage’s API could be used for reverse mobile phone geolocation. But whereas those erroneous YouTube tutorials only convinced a few people to sign up for OpenCage’s API, ChatGPT has driven people to OpenCage in droves. “The key difference is that humans have learned to be skeptical when getting advice from other humans, for example via a video coding tutorial,” OpenCage wrote. “It seems though that we haven’t yet fully internalized this when it comes to AI in general or ChatGPT specifically.” I guess we better start internalizing.

Meanwhile, after a slew of alarming publicity about the dark side of its new, OpenAI-powered Bing chat feature—where the chatbot calls itself Sydney, becomes petulant, and at times even downright hostile and menacing—Microsoft has decided to restrict the length of conversations users can have with Bing chat. But as I, and many others have found, while this arbitrary restriction on the length of a dialogue apparently makes the new Bing chat safer to use, it also makes it a heck of a lot less useful.

For instance, I asked Bing chat about planning a trip to Greece. I was in the process of trying to get it to detail timings and flight options for an itinerary it had suggested when I suddenly hit the “Oops, I think we've reached the end of this conversation. Click 'New topic,' if you would!”

The length restriction is clearly a kluge that Microsoft has been forced to implement because it didn’t do rigorous enough testing of its new product in the first place. And there are huge outstanding questions about exactly what Prometheus, the name Microsoft has given to the model that powers the new Bing, really is, and what it is really capable of (no one is claiming the new Bing is sentient or self-aware, but there’s been some very bizarre emergent behavior documented with the new Bing, even beyond the Sydney personality, and Microsoft ought to be transparent about what it understands and doesn’t understand about this behavior, rather than simply pretending it doesn’t exist). Microsoft has been cagey in public about how it and OpenAI created this model. No one outside of Microsoft is exactly sure why it is so prone to taking on the petulant Sydney persona, especially when ChatGPT, based on a smaller, less capable large language model, seems so much better behaved—and again, Microsoft is saying very little about what it does know.

(Earlier research from OpenAI had found that it was often the case that smaller models, trained with better quality data, produced results that human users much preferred even though they were less capable when measured on a number of benchmark tests than larger models. That has led some to speculate that Prometheus is OpenAI’s GPT-4, a model believed to be many times more massive than any it has previously debuted. But if that is the case, there is still a real question about why Microsoft opted to use GPT-4 rather than a smaller, but better-behaved system to power the new Bing. And frankly, there is also a real question about why OpenAI might have encouraged Microsoft to use the more powerful model if it in fact realized it had more potential to behave in ways that users might find disturbing. The Microsoft folks may have, like many A.I. researchers before them, become blinded by stellar benchmark performance that can convey bragging rights among other A.I. developers, but which are a poor proxy for what real human users want.)

What is certain is that if Microsoft doesn’t fix this soon—and if someone else, such as Google, which is hard at work trying to hone its search chatbot for imminent release, or any of the others, including startups such as Perplexity and You.com, that have debuted their own chatbots, shows that their chatbot can hold long dialogues without it turning into Damien—then Microsoft risks losing its first mover advantage in the new search wars.  

Also, let’s just take a moment to appreciate the irony that it's Microsoft, a company that once prided itself, not without reason, on being among the most responsible of the big technology companies, which has now tossed us all back to the bad old “move fast and break things” days of the early social media era—with perhaps even worse consequences. (But I guess when your CEO is obsessed with making his arch-rival “dance” it is hard for the musicians in the band to argue that maybe they shouldn’t be striking up the tune just yet.) Beyond OpenCage, Clarkesworld, and Stack Overflow, people could get hurt from incorrect advice on medicines, from abusive Sydney-like behavior that drives someone to self-harm or suicide, or from reinforcement of hateful stereotypes and tropes.

I’ve said this before in this newsletter, but I’ll say it again: Given these potential harms, now is the time for governments to step in and lay down some clear regulation about how these systems need to be built and deployed. The idea of a risk-based approach, such as that broached in the original draft of the European Union’s proposed A.I. Act, is a potential starting point. But the definitions of risk and those risk assessments should not be left entirely up to the companies themselves. There need to be clear external standards and clear accountability if those standards aren’t meant.

With that, here’s the rest of this week’s A.I. news.

Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.