Interpretation of Turing’s 70% benchmark as a success criterion

Ascertain whether Alan Turing’s 1950 prediction that “an average interrogator will not have more than a 70 percent chance of making the right identification after five minutes of questioning” was intended as a formal definition of passing the imitation game (Turing test), rather than merely as an illustrative forecast or benchmark.

Background

The paper discusses how to determine whether a system has passed the Turing test and notes that Turing’s 1950 paper included a prediction about an interrogator’s accuracy after five minutes of questioning, which has sometimes been used to set a 30% pass rate benchmark. The authors argue that this benchmark appears arbitrary and raise uncertainty about whether Turing intended this prediction to serve as a formal success definition for the test.

Clarifying Turing’s intention matters because it affects how contemporary experiments, including this paper’s randomized controlled two-player test with GPT-4, should be evaluated against historical standards for success. Without resolving this interpretative question, debates about what constitutes “passing” risk relying on potentially unfounded criteria.

References

This benchmark seems arbitrary however, and it's not clear that Turing meant it as a definition of success.

— People cannot distinguish GPT-4 from a human in a Turing test (2405.08007 - Jones et al., 9 May 2024) in Discussion, subsection "Does GPT-4 pass the Turing test?"

Interpretation of Turing’s 70% benchmark as a success criterion

Background

References

Related Problems