Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human or Not? A Gamified Approach to the Turing Test (2305.20010v1)

Published 31 May 2023 in cs.AI, cs.CL, cs.CY, and cs.HC

Abstract: We present "Human or Not?", an online game inspired by the Turing test, that measures the capability of AI chatbots to mimic humans in dialog, and of humans to tell bots from other humans. Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI LLM which was prompted to behave like humans. The task of the players was to correctly guess whether they spoke to a person or to an AI. This largest scale Turing-style test conducted to date revealed some interesting facts. For example, overall users guessed the identity of their partners correctly in only 68% of the games. In the subset of the games in which users faced an AI bot, users had even lower correct guess rates of 60% (that is, not much higher than chance). This white paper details the development, deployment, and results of this unique experiment. While this experiment calls for many extensions and refinements, these findings already begin to shed light on the inevitable near future which will commingle humans and AI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Daniel Jannai (5 papers)
  2. Amos Meron (1 paper)
  3. Barak Lenz (8 papers)
  4. Yoav Levine (24 papers)
  5. Yoav Shoham (22 papers)
Citations (19)

Summary

Human or Not? A Gamified Approach to the Turing Test

The paper "Human or Not? A Gamified Approach to the Turing Test" presents an innovative and large-scale social experiment conducted by AI21 Labs to assess the current capabilities of AI LLMs in mimicking human behavior through natural language conversation. This experiment provides a fresh perspective on the classic Turing Test, using a gamified setup to engage over 1.5 million participants in identifying whether they conversed with a human or an AI chatbot in two-minute chat sessions. The experiment's scale and design offer significant contributions to the understanding of human-AI interaction dynamics, as well as AI's progress in achieving human-like conversational abilities.

The experiment revealed that users correctly identified whether they were conversing with a human or a bot only 68% of the time. Notably, participants were particularly challenged when interacting with AI, identifying the bots correctly only 60% of the time. These results, obtained from approximately 10 million interactions, closely align with Alan Turing's prediction that an average human's ability to distinguish between human and AI would hover around 70% accuracy after a short dialogue—a prediction made more than 70 years ago. This finding underscores the significant advancements in AI LLMs and highlights the persistent complexity in differentiating AI-generated language from human language in brief interactions.

Several key strategies emerged from the participants as they attempted to distinguish between their interlocutors. These included scrutinizing message grammaticality, politeness, and engagement in subjects presumed difficult for AI, such as emotional topics or contemporary happenings. Conversely, AI chatbots were designed to mimic human traits such as spelling mistakes, use of slang, and contextual awareness, further complicating the identification task. These complexities highlight both strengths and limitations in current AI models, emphasizing their growing capability to emulate human-like conversational patterns while also indicating areas requiring further refinement.

The experiment also illustrated participants' adaptive strategies in signaling their own humanity, often through leveraging imperfections traditionally seen as uniquely human, such as typos or rudeness. Intriguingly, some players sought to impersonate AI, reflecting deep-seated perceptions about AI characteristics and communication styles.

The implications of this paper are substantial for the broader landscape of AI development and deployment. The findings provide a statistically robust benchmark against which future enhancements in AI conversational capabilities can be measured. Additionally, they illuminate the nuanced and evolving nature of human-AI interaction, inviting further inquiry into how AI systems can be ethically and effectively integrated into various domains of human activity. Future studies might focus on extending the interaction time, employing different AI systems, or exploring varying cultural contexts to enrich understanding across diverse user demographics.

In conclusion, this paper offers a comprehensive assessment of the current dichotomy between human and AI conversation through an empirical lens. It demonstrates the contemporary AI models' striking improvements in generating human-like interactions, while concurrently posing vital questions on our preparedness to ethically and responsibly harness such technology. As AI continues to intersect more intricately with human life, experiments like this stand as pivotal reference points for the complexities inherent in crafting AI that genuinely communicates with us on human terms.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com