Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text (2212.12672v1)

Published 24 Dec 2022 in cs.CL, cs.AI, and cs.HC

Abstract: As text generated by LLMs proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural LLMs. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text.

PDF Abstract

Investigating Human Detection of Machine-Generated Text Boundaries

The paper "Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text" addresses a critical aspect of engaging with neural LLMs (LMs): human detection of generated text. The authors explore the ability of human annotators to identify transitions from human-written to machine-generated text within documents, a nuanced task compared to previous binary classification research. This investigational shift is pertinent given the real-world applications of LMs where text generation typically continues from a human-provided prompt.

Core Findings and Methodology

Boundary Detection Task: The paper uniquely frames the detection task as boundary identification, utilizing the RoFT gamified platform. Players predict transition points in text, from human to machine-generated, highlighting not only whether a text contains generated content but pinpointing when it begins.
Human Performance Variability: Results indicate significant variance in annotator skill, with performance improving when proper incentives and guidance are provided. Annotators were able to identify boundary sentences with a 23.4% success rate, notably better than random chance, although overall there remained a notable challenge in perfect boundary identification.
Factors Affecting Detection: The paper examines variables such as model size, decoding strategies, and text genre. Larger models demonstrate better invisibility, with GPT-2 XL generations proving harder to detect than GPT-2 small. Additionally, text genres influence errors; for instance, structured texts like recipes present unique challenges that make generated text easier to spot.
Game and Incentives: Incorporating a point-based incentive within the detection game, results showed that players motivated by rewards improved over time, reflecting that the ability to identify machine-generated text is a trainable skill.
Comparison Across Models: The paper provides comparative analysis on varying generation choices. For instance, model fine-tuning and control codes were speculated to affect undetectability but showed limited impact in a practical context.

Implications and Future Directions

This work underscores the complexity of human interaction with machine-generated text and the risks associated with undetected generation in sensitive domains. It provides a methodical basis for future evaluations of LLM outputs, suggesting that human detection and evaluation skills can be honed. The variability in human abilities and the impact of incentives highlight actionable areas for improving and predicting human oversight in applications involving LMs.

Further research could explore automation in detection tasks, benchmarking AI systems against human performance in identifying generated text. Additionally, examining detection capabilities across more varied demographic groups and application scenarios could provide richer insights into the challenges faced internationally.

By contributing to the growing dataset of human annotations tied to machine-generated content, this paper advances the discourse on maintaining oversight and integrity in AI-powered text generation systems. It also invites expansions on the approach, such as exploring deeper layers of reasoning used by annotators, improving automated detection algorithms, and refining human-machine collaborative frameworks for tackling misinformation and other harmful content generated by neural models.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Liam Dugan (18 papers)
Daphne Ippolito (47 papers)
Arun Kirubarajan (3 papers)
Sherry Shi (4 papers)
Chris Callison-Burch (102 papers)

Citations (50)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos