Is the Turing Test Dead?

Researchers wonder whether improved large language models require new tests for machine intelligence

3 min read
A photo illustration of a person looking at a glowing box in their hand.
Daniel Zender

When in 1950 Alan Turing first proposed an approach to distinguish the “minds” of machines from those of human beings, the idea that a machine could ever achieve human-level intelligence was almost laughable.

In the Turing test—which Turing himself originally called the “imitation game“—human participants conduct a conversation with unknown users to determine if they’re talking to a human or a computer. In 2014, a chatbot masquerading as a Ukrainian teenager named Eugene Goostman seemed to put one of the first nails in the Turing test’s coffin by fooling more than one-third of human interrogators into thinking they were talking to another human, although some researchers dispute the claim that the chatbot passed the test.

Today, we run into seemingly intelligent machines all day long. Our smart speakers tell us to bring umbrellas on our way out the door and large language models (LLMs) like ChatGPT can write promotion-worthy emails. Stacked up against a human, these machines might be easy to confuse with the real thing.

Does this mean the Turing test is a thing of the past?

In a new paper published 10 November in the journal Intelligent Computing, a pair of researchers have proposed a new kind of intelligence test that treats machines as participants of a psychological study to determine how closely their reasoning skills match those of human beings. The researchers are Philip Johnson-Laird, a Princeton psychology professor and pioneer of the mental model of human reasoning, and Marco Ragni, a professor of predictive analytics at Chemnitz University of Technology, in Germany.

“As chatbots have approached and succeeded at the Turing test, it has quietly slipped away from importance.” —Anders Sandberg, University of Oxford

In their paper, Johnson-Laird and Ragni argue that the Turing test was never a good measure of machine intelligence in the first place, as it fails to address the process of human thinking.

“Given that such algorithms do not reason in the way that humans do, the Turing test and any others it has inspired are obsolete,” they write.

This assertion is one that Anders Sandberg, a senior research fellow at the University of Oxford’s Future of Humanity Institute, says he agrees with. That said, he’s not convinced that a human-reasoning assessment will be the ultimate test of intelligence either.

“As chatbots have approached and succeeded at the Turing test, it has quietly slipped away from importance,” Sandberg says. “This paper tries to see if a program reasons the way humans reason. That is both interesting and useful, but will of course only tell us if there is human-style intelligence, not some other form of potentially valuable intelligence.”

Likewise, even though Turing tests may be going out of fashion, Huma Shah, an assistant professor of computing at the University of Coventry, in England, whose research has focused on the Turing test and machine intelligence, says that doesn’t necessarily mean they’re no longer useful.

“In terms of indistinguishability, no, [the Turing test is not obsolete],” Shah says. “You can apply indistinguishability to other areas where we would want a machine’s performance to be as good as or better than a human carrying out that task efficiently and ethically. For example, in facial recognition, or the ability to drive safely while avoiding hurting passengers and pedestrians.”

As for Johnson-Laird and Ragni’s test, it would be carried out in three steps. First, machines would be asked a number of questions to test their own reasoning—for example, they could be asked, “If Ann is intelligent, does it follow that Ann is intelligent or she is rich, or both?” They would then be tested on whether or not they understood their own reasoning, such as with the response “Nothing in the premise supports the possibility that Ann is rich.” Finally, researchers would take a look under the hoods of the machines to determine whether the neural networks are built to simulate human cognition.

This last step is where Sandberg worries there could be complications.

“The last step can be very hard,” he says. “Most LLMs are vast neural networks that are not particularly inspectable, despite much research on how to do this.”

Translating a machine’s internal representation of reasoning into a form that humans can understand may even ultimately distort the original nature of the machine’s thought process, Sandberg says. In other words, would we recognize a machine’s interpretation of human reasoning if we saw it?

This question is especially complicated, as the science of human cognition itself isn’t yet set in stone.

While replacing the Turing test may not be a simple process, Shah says that alternatives like this reasoning test have the opportunity to advance how we think about these big questions, like what it means to be human. They may also help shed light on what it means to be a computer, such as what processes take place inside a neural network’s black box.

“If new tests for human-machine indistinguishability progress machine ‘explainability’—for example, the ‘reasoning’ in algorithms that render their decision-making comprehensible to the general public, such as in financial algorithms for insurance, mortgages, loans, etc., then this objective is an invaluable contribution to progressing intelligent machinery,” Shah says.

The Conversation (5)
Victor Lebedinskiy
Victor Lebedinskiy04 Dec, 2023
INDV

As I remember Turing test in real was more triky. Idea was that computer should play the game where computer (player A) and another human (player B) speak with investigator. And Investigator should figure out "who is who". One of the players should help to Investigator another should make him to make a mistake. And we can speak about a computer as a human intelligence level machine when computer will play this game as good as a man.

In original Turing test machine should compete with man but not just pretend a man.

Jean-Marc Deschamps
Jean-Marc Deschamps01 Dec, 2023
INDV

And we can follow the neurons to know how a person thinks ?????

James Brady
James Brady07 Dec, 2023
LF

Donald is, in the terms of a precision machinist, dead nuts on!