AI Just Passed The Famous Turing Test
Headlines about the growing power of Large Language Models (LLMs) like ChatGPT seem to be a daily occurrence, sparking both excitement and concern. A particularly significant story emerged earlier this year from a paper describing how an LLM passed the Turing Test, a foundational experiment for gauging machine intelligence.
The model in question, ChatGPT 4.5, successfully convinced nearly 75% of human participants that it was the real person in a side-by-side comparison. This is a monumental claim, but what does it truly signify?
What Exactly Is the Turing Test
To understand the implications, we must first look at what the Turing Test is. Cameron Jones, a postdoctoral student at UC San Diego and co-author of the study, explains that the test originates from Alan Turing's groundbreaking 1950 paper, “Computing Machinery and Intelligence.” Turing sought to replace the ambiguous question “Can machines think?” with a more concrete one: “Can a machine act in such a manner that its actions are indistinguishable from those of a human?”
His proposed experiment, which he called “The Imitation Game,” involves a human interrogator communicating in writing with two hidden witnesses: one human and one machine. The interrogator's goal is to determine which is which. A machine is said to have passed the test if it can consistently fool the interrogator.
Alan Turing, the pioneering English mathematician and computer scientist whose ideas shaped modern computing. Image: Public Domain
In the recent experiment, ChatGPT 4.5 was remarkably successful, with 73% of participants identifying it as the human. Another model, LLaMa-3.1-405B, also passed, fooling 56% of people.
Intelligence vs Imitation The Core Debate
Passing the test doesn't automatically equate to true machine intelligence or consciousness. Turing's argument was more nuanced. He reasoned that since we judge other humans as intelligent based on their behavior (as we cannot access their minds directly), we should apply the same standard to a machine whose behavior is indistinguishable from a human's. If we can't tell the difference, he argued, we have to consider the possibility that the machine is intelligent.
This behaviorist perspective has been a key point of contention for decades. The central question is whether we attribute intelligence based on behavior alone.
Why Language Is the Ultimate Test
Turing's choice of written language was deliberate. It neutralizes physical differences between humans and machines, focusing solely on the output of thought. For all of history, coherent language has been inextricably linked to human cognition. As Rusty Foster noted in a recent essay, “we have never been required to distinguish between ‘language’ and ‘thought’ because only thought was capable of producing language.”
This deep-seated association makes it incredibly difficult not to attribute some level of intelligence to a machine that can 'talk' to us. Are these advanced LLMs just sophisticated parrots, mimicking patterns without understanding? Even with a real parrot, our instinct is to talk back, despite knowing it doesn't comprehend our words.
“It’s a super behaviorist perspective on what intelligence is—that to be intelligent is to display intelligent behavior,” says Jones. “And so you might want to have other conditions: You might require that a machine produce the behavior in the right kind of way.”
A parrot can mimic human language, but that doesn’t mean it understands what it’s saying. Image: DepositPhotos
The Chinese Room Thought Experiment
One of the most famous challenges to the Turing Test's premise is John Searle’s Chinese Room thought experiment from 1980. Searle imagined a person who doesn't speak Chinese locked in a room. They receive Chinese characters under the door and, using a complex rulebook, produce the correct corresponding characters as a response. To an outsider, it appears the person in the room is a fluent Chinese speaker, but in reality, they understand nothing. They are simply executing a program.
Searle's point was to draw a sharp distinction between appearing to understand and genuine understanding, a direct rebuttal of Turing's behavior-based assessment.
Tweaking AI to Seem More Human
Interestingly, the researchers in the recent study had to fine-tune the LLMs to pass the test. A key challenge was, as Jones puts it, “getting [the model] to not do stuff that ChatGPT does.” This involved teaching the model to use sentence fragments and casual language typical of text conversations.
The team even experimented with making ChatGPT produce typos to appear more human, which proved difficult. “If you just tell an LLM to try really hard to make spelling errors, they do it in every word, and the errors are really unconvincing,” Jones noted.
These adjustments highlight that passing the test was as much about imitating human imperfections as it was about demonstrating intelligence.
Even the computer programmers that created artificial intelligence don’t know how it works. Credit: TED-Ed
A Startlingly Accurate Prediction
In his 1950 paper, Turing made a prescient prediction: “I believe that in about 50 years’ time it will be possible to programme computers…to make them play the imitation game so well that an average interrogator will not have more than [a] 70 per cent chance of making the right identification after five minutes of questioning.”
He was off by about 25 years, but the core prediction has now come to pass. A machine can indeed fool an average person about 70% of the time.
So What Makes Human Intelligence Unique
This brings us back to the central question: what does this milestone mean? “That’s a question I’m still struggling with,” Jones admits. The result is undeniable evidence that models can imitate human behavior so well that we can’t tell the difference. This has profound social implications.
However, it doesn't settle the philosophical debate. Passing the test is not sufficient proof of intelligence or consciousness. If we reject the Turing Test, do we have a better way to identify genuine artificial intelligence? Most would agree that consciousness involves more than just behavior.
Ultimately, the quest to define machine intelligence forces us to confront how little we understand our own. We hold a firm belief in human uniqueness, yet many traits we once thought were exclusively ours—like tool use, complex societies, and empathy—have been observed in other animals. The source of our own consciousness remains a mystery. To know if a machine can truly think, we may first need a much deeper understanding of how we do it ourselves.