Disentangle displacement effects from population confounds in Turing test accuracy

Ascertain whether the reduction in adjudication accuracy observed for displaced human judges who read Turing test transcripts, relative to interactive human interrogators, is attributable to displacement per se rather than to differences in participant populations (social-media-recruited interactive interrogators versus undergraduate displaced adjudicators).

Background

The paper compares interactive Turing tests, where human interrogators directly question witnesses, with displaced Turing tests, where different human judges read transcripts of those interactions. Displaced human adjudicators were significantly less accurate than interactive interrogators at distinguishing humans from AI witnesses.

However, the participant pools differed: interactive interrogators were recruited via social media, while displaced adjudicators were undergraduate students. This confound prevents attributing the observed accuracy drop solely to displacement, leaving unresolved whether displacement itself reduces detection accuracy or whether demographic and motivational differences are responsible.

References

Interactive interrogators were recruited via social media while displaced participants were undergraduate students. We therefore cannot know whether this drop in accuracy is purely due to the effect of displacement.

— GPT-4 is judged more human than humans in displaced and inverted Turing tests (2407.08853 - Rathi et al., 11 Jul 2024) in Section 3.3 Discussion

Disentangle displacement effects from population confounds in Turing test accuracy

Background

References

Related Problems