We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) intwo randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time—not significantly more or less often than the humans they were being compared to—while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.
(emphasis mine)
As I checked for a published version (it hasn't), I stumbled on this general audience article on the topic in case the hedged language from the arXiv is not your thing.
Wow, really surprised it hadn't been done before.
I've long since thought we were past the Turing Test. Now we just need a new definition of AGI.
Surprising indeed. I thought it'd have been one of the obvious benchmarks people would try to test. I guess they did test it, but it's the first time it passes.
On the contrary, the reason why A “small” language model—by which we generally mean one with on the order of millions (rather than billions) of parameters—struggles to mimic human conversation convincingly for several interlocking reasons:
Fewer parameters mean the model can store and manipulate far less information about language patterns, world knowledge, and subtle linguistic nuances.
As a result, it often resorts to simplistic or repetitive responses, rather than the rich variety of expression a human would use.
Small models tend to overfit to the specific data they were trained on, so they struggle with novel topics or unexpected turns in conversation.
They lack the depth to maintain a consistent persona, long-term context, or coherent thread over multiple turns, making their dialogue feel disjointed or “robotic.”
Passing a Turing test usually requires not just fluent language but also common-sense reasoning, up‑to‑date facts, and the ability to draw inferences.
With constrained capacity, small models cannot internalize large-scale factual databases or sophisticated reasoning patterns; they often hallucinate or give incorrect answers when pressed.
At their core, small LMs are powerful pattern‑matchers but lack the deeper latent structures (e.g., causal models, theory of mind) that larger models can approximate.
This leads to responses that may look grammatically correct but fail to capture intentions, emotions, or the pragmatic subtleties of human dialogue.
Implications for the Turing Test
Alan Turing’s original proposal envisioned an interlocutor capable of sustained, varied, and contextually appropriate conversation. Small language models simply don’t have the “brain‑like” resources—be it memory, breadth of knowledge, or reasoning scaffolds—to convincingly impersonate a human over an extended exchange. In short, they lack both the scale and depth required to fool a well‑informed judge.
I prefer original thoughts.
Sure, we all do but Ai is taking over the world, courtesy of the human knowledge cos we actually programmed the Ai to function the way it does. 🤷♂️
What do you get, personally, out of copy-pasting this kind of text from an LLM? Genuine question. I really don't understand. Would you have done it in absence of the incentive of sats? Are you hoping to start a discussion?
I stopped reading as soon as i realised it was AI.
Yeah sure the territory is AI, lol
Ok, you do you :)
Wish I understood your phrase.
Ask ChatGPT ;)
Jk, just means that you do as you wish. I won't mind and just focus on my own stuff.