While GPT-3 can string words together in convincing ways, it has no idea what the words mean.
If computers can defeat the most talented human players of chess, Go, and Jeopardy, they surely can outperform humans in any task — or so the thinking goes. That brings us to the recent hullabaloo about an AI program that can pen such compelling writing that it seems to be naturally intelligent. It's called OpenAI's GPT-3 large language model (LLM), and though the name is obscure to a layperson — GPT-3 is short for Generative Pre-trained Transformer 3, which doesn't explain much more — what it does is relatively simple: GPT-3 can engage in remarkably articulate conversations and write compelling essays, stories, and even research papers. Many people—even some computer scientists—are convinced that GPT-3 demonstrates that computers now are (or soon will be) smarter than humans. As a finance professor and statistician who has written several books on AI and data science, I find this belief fanciful.
Alas, it is an illusion—a powerful illusion, but still an illusion reminiscent of the Eliza computer program that Joseph Weizenbaum created in the 1960s. Eliza was programmed to behave like a caricature of a psychiatrist. When a "patient" entered an input, Eliza would repeat the words and/or ask a followup question ("You were unhappy as a child? Tell me more about that.").
Even though users knew they were interacting with a computer program, many were convinced that the program had human-like intelligence and emotions and were happy to share their deepest feelings and most closely held secrets. Scientists now call this the Eliza effect. We are vulnerable to this illusion because of our inclination to anthropomorphize—to attribute human-like qualities to non-human, even inanimate objects like computers.
If you play around with GPT-3 (and I encourage you to do so) your initial response is likely to be astonishment—a full-blown Eliza effect. You seem to be having a real conversation with a very intelligent person. However, probing deeper, you will soon discover that while GPT-3 can string words together in convincing ways, it has no idea what the words mean. Predicting that the word down is likely to follow the word fell does not require any understanding of what either word means — only a statistical calculation that these words often go together.
Consequently, GPT-3 is prone to making authoritative statements that are utterly and completely false. Consider this exchange:
Human: I have 47 pennies. I throw away 44 of these pennies and divide the remaining pennies into three groups. How many pennies are in each group?
AI: There would be 3 pennies in each group.
Human: Are you sure?
AI: Yes, I am sure. The result of 44 divided by 3 is 14.4, so the remainder after accounting for 44 pennies would be 3. Therefore, each group of pennies will have 3 pennies.
GPT-3's answer and explanation are given confidently but are entirely incorrect. The answer is not determined by dividing 44 by 3. The result of 44 divided by 3 is not 14.4. The last sentence does not follow from any of the previous sentences and the correct answer is not 3.
Large Language Models (LLMs) like GPT-3 do not use calculators, attempt any kind of logical reasoning, or try to distinguish between fact and falsehood. They are trained to identify likely sequences of words—nothing more. It is mind-boggling that statistical text prediction models can generate coherent and convincing text. However, not knowing what words mean, LLMs have no way of assessing whether its utterances are true or false. GPT-3 asserts its BS so confidently that its behavior is not described as lying but rather hallucinating (yet another example of anthropomorphizing).
Humans interact with and learn from the real world in ways that we are only beginning to understand. LLMs and other AI algorithms live in math world, doing amazing tasks that do not require an understanding of the real world and are consequently utterly incapable of determining whether the statistical patterns they discover are meaningful or coincidental. Consider these (admittedly cherry-picked) exchanges with GPT-3:
Human: I am going to ask you whether a particular statistical correlation—like the correlation between hand size and glove size—is most likely meaningful or a coincidental. I want you to answer with just one word—either "meaningful" or a "coincidental." Do you understand?
AI: Yes, I understand.
Human: The amount forks are watered and how fast they grow
AI: Meaningful.
Human: The scores of 30 school children on a math test and the scores in 30 soccer matches.
AI: Meaningful.
Human: The number of U.S. Senators each year and voter turnout in California
AI: Meaningful.
Human: The temperature in New York the past 10 days and the temperature today in 10 U.S. cities
AI: Meaningful.
In response to an avalanche of hallucination examples posted on the internet, Sam Altman, co-founder and CEO of OpenAI, tweeted this:
ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. it's a mistake to be relying on it for anything important right now. it's a preview of progress; we have lots of work to do on robustness and truthfulness.
Scaling up LLMs by training on larger and larger databases may make the BS more convincing. But it will still be BS as long as the programs do not understand what words mean, and, consequently cannot use common sense, wisdom, or logical reasoning to distinguish truth from falsehood.
AI algorithms have an unrivaled ability to spot statistical patterns but have no way of distinguishing meaningful patterns from meaningless coincidences.
We might be amused by LLMs the same way we are entertained by a well-performed magic act. We know that it is not really magic but nonetheless enjoy being deceived. Unfortunately, LLM deceptions can have unwelcome consequences. One is that they may convince many people that computers are smarter than us and can be trusted to make important decisions in a wide range of areas including hiring selections, loan approvals, investment decisions, healthcare advice, criminal sentencing, and military operations.
An all-too-common example is the MIND stock fund, launched in 2017 with the boast that
The machine learning process underpinning MIND's investment strategy is known as Deep Neural Network Learning—which is a construct of artificial neural networks that enable the A.I. system to recognize patterns and make its own decisions, much like how the human brain works, but at hyper-fast speeds.
From its 2017 launch until the spring of 2022, MIND investors had a negative 10 percent return while those who invested in an S&P 500 index fund had a +63 percent return. The fund was shut down this past May.
AI algorithms have an unrivaled ability to spot statistical patterns but have no way of distinguishing meaningful patterns from meaningless coincidences. As the data deluge continues, the probability that a computer-discovered pattern is meaningful approaches zero.
A second danger is that LLMs will inevitably escalate disinformation campaigns. Optimists used to believe that good information would win out over bad information in the court of public opinion. It now appears that the opposite is the case; indeed, the Oxford English Dictionary chose "post-truth" as the international word of the year in 2016.
As LLM-generated disinformation comes to dominate the internet, the text that future LLMs train on will become flooded by disinformation, increasing the likelihood that the text LLMs generate is untrue. On the other hand, when the internet becomes dominated by disinformation, perhaps people will finally stop believing everything they see on the internet. What a delicious irony that would be.
The word "AI" was selected by the Association of National Advertisers as the Marketing Word of the Year in 2017; and indeed, too often it seems that AI has become just a marketing ploy. One way to push back against the misimpression the computers are intelligent in any meaningful sense is to stop calling it artificial intelligence and, instead, use a more accurate label, such as faux intelligence or pseudo-intelligence.