DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature (2301.11305v2)

Published 26 Jan 2023 in cs.CL and cs.AI

Abstract: The increasing fluency and widespread usage of LLMs highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained LLM (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.

Authors (5)

Eric Mitchell (28 papers)
Yoonho Lee (26 papers)
Alexander Khazatsky (9 papers)
Christopher D. Manning (169 papers)
Chelsea Finn (264 papers)

Citations (461)

View on Semantic Scholar

Summary

DetectGPT: Zero-Shot Detection of Machine-Generated Text

The proliferation of LLMs has created challenges in distinguishing human-authored text from text generated by machines. The paper presented provides a novel approach to addressing this challenge through a tool called DetectGPT, which facilitates zero-shot detection of LLM-generated text. By identifying a unique property of LLMs related to the curvature of log probability functions, DetectGPT offers a method that is more discriminative than existing zero-shot techniques, without the need to train additional classifiers or collect extensive datasets.

Problem Context and Motivation

LLMs, such as GPT-3 and ChatGPT, are capable of producing highly fluent and seemingly coherent responses across diverse topics, including science, mathematics, and current events. Despite this fluency, these model-generated texts are often incorrect yet persuasive enough to be mistaken for human-authored content. This has significant implications for fields such as education and journalism, where the integrity and factual accuracy of content are paramount. DetectGPT addresses the critical need for reliable tools that detect machine-generated text to mitigate risks related to undetected AI-written content.

Key Contributions

Theoretical Insight: The authors introduce the hypothesis that LLM-generated text occupies regions of negative curvature in the model's log probability space. This insight is empirical, based on the observation that such texts, when perturbed, tend to display a decrease in log probability, unlike human-generated texts.
Algorithm Development: Leveraging this insight, DetectGPT sets a new standard for zero-shot machine-generated text detection. It operates by comparing the log probabilities of an original passage with several perturbed versions. If these perturbations result in considerably lower log probabilities, the text is likely machine-generated.
Empirical Validation: The paper evidences DetectGPT's superiority over prior methods, enhancing the Area Under the Receiver Operating Characteristic curve (AUROC) by 0.1 in some experiments when detecting machine-generated news articles.

Methodology

DetectGPT's approach involves generating slight perturbations of a candidate passage using a generic pre-trained LLM, such as T5. The core hypothesis is that text from LLMs lies at local maxima of log probability functions, implying negative curvature. DetectGPT calculates the average log probability ratio of original versus perturbed samples, using this metric to classify the provenance of the text.

Comparative Analysis

The paper compares DetectGPT with existing zero-shot detection techniques that primarily rely on assessing average token log probabilities and ranks. DetectGPT demonstrates a consistent performance edge, particularly when detecting texts generated by models ranging from GPT-2 to GPT-NeoX. The research also benchmarks DetectGPT against supervised detectors, showing competitive performance even when supervised models are faced with distribution shifts, which typically degrade their efficacy.

Implications and Future Directions

DetectGPT signifies a step forward in AI-generated content detection with its reliance on model log probabilities alone, sidestepping extensive pre-training or access to labeled datasets. This approach holds promise for broader applications, potentially extendable to multimedia content like audio or images. Furthermore, DetectGPT indicates potential synergy with watermarking efforts to enhance detectability of machine-generated text. Future research may delve into optimizing perturbation functions, exploring ensemble methods for unknown source models, and examining detection robustness across different languages and dialects.

In essence, the DetectGPT framework exemplifies a robust step toward safeguarding the authenticity of textual content in an era of rapidly expanding AI capabilities. It addresses critical vulnerabilities while maintaining adaptability across varying data domains, highlighting the pragmatic evolution of zero-shot detection methodologies in AI-driven landscapes.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/divyansh70305/status/1928127883362320669

YouTube

Show All Videos