Smaller Language Models are Better Black-box Machine-Generated Text Detectors (2305.09859v4)

Published 17 May 2023 in cs.CL and cs.LG

Abstract: With the advent of fluent generative LLMs that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures. To this end, there have been a slew of methods proposed to detect machine-generated text. Most of these methods need access to the logits of the target model or need the ability to sample from the target. One such black-box detection method relies on the observation that generated text is locally optimal under the likelihood function of the generator, while human-written text is not. We find that overall, smaller and partially-trained models are better universal text detectors: they can more precisely detect text generated from both small and larger models. Interestingly, we find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.

PDF Abstract

Smaller LLMs as Effective Black-box Machine-Generated Text Detectors

The paper entitled "Smaller LLMs are Better Black-box Machine-Generated Text Detectors" offers a compelling investigation into the capabilities of LLMs as detectors of machine-generated text, focusing on scenarios where the generating model's identity and details are unknown. As the use of LLMs becomes increasingly prevalent across various sectors, accurately distinguishing between human-created and machine-generated content is paramount for maintaining the integrity of information dissemination, particularly in contexts like news verification and the authenticity of online reviews.

Core Contributions

The central tenet of this paper is not merely the ability to detect one's own generated content but to assess the feasibility and efficacy of using one LLM to discern text generated by another. Specifically, the work posits that smaller and partially-trained models serve as more effective universal detectors. This is not limited by the congruence in architecture or training data between the detector and generator models. For instance, a smaller model like the OPT-125M achieved an AUC of 0.81 in detecting content generated by ChatGPT, significantly surpassing a larger model in the GPT family, such as GPTJ-6B, which shows an AUC of only 0.45.

Methodology

The researchers employ a methodology grounded in the concept of local optimality within the probability surface of a LLM. This involves forming a target pool of sequences comprised equally of human-written and machine-generated text. Perturbations of these sequences are created to facilitate the evaluation of local likelihood optima using a detector model's likelihood function.

Experimental Analysis

Utilizing a diverse range of models spanning various sizes, architectures, and training specifications, the paper thoroughly explores the correlation between model parameters and detection performance. Smaller models consistently emerge as superior cross-detectors. For example, the OPT-125M model nearly mirrors self-detection capabilities, with an AUC gap of merely 0.07 when cross-detecting machine-generated content. Notably, smaller models provide a broader detection range as they produce heightened curvature and fail to overly specify likelihood when evaluating larger model outputs.

Theoretical and Practical Implications

This research underscores crucial theoretical insights into the structural understanding of LLMs' operating behavior as detectors, particularly emphasizing that model size inversely correlates with effective cross-detection capabilities. From a practical perspective, these findings bolster the utility of smaller models in applications where access to comprehensive data about the text generator is limited or constrained by privacy and proprietary barriers.

Limitations and Future Directions

Despite the robustness demonstrated in leveraging smaller models for text detection, nuances such as the fidelity of the neighborhood generation and model-specific biases are areas for deeper exploration. Future research might extend these insights into the simultaneous optimization of detection algorithms and model efficiency, possibly integrating these models into streamlined pipelines capable of deploying at scale without sacrificing performance.

In conclusion, this paper presents a critical examination of the utility of smaller LLMs as universal detectors of machine-generated text, providing valuable direction for both academic inquiry and practical deployment scenarios amidst the growing use of LLMs in varied information-rich environments.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Justus Mattern (9 papers)
Sicun Gao (54 papers)
Reza Shokri (46 papers)
Taylor Berg-Kirkpatrick (106 papers)
Niloofar Mireshghallah (24 papers)

Citations (43)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/niloofar_mire/status/1770181964961640885

https://twitter.com/niloofar_mire/status/1858940423408153082