DUPE: Detection Undermining via Prompt Engineering for Deepfake Text (2404.11408v1)
Abstract: As LLMs become increasingly commonplace, concern about distinguishing between human and AI text increases as well. The growing power of these models is of particular concern to teachers, who may worry that students will use LLMs to write school assignments. Facing a technology with which they are unfamiliar, teachers may turn to publicly-available AI text detectors. Yet the accuracy of many of these detectors has not been thoroughly verified, posing potential harm to students who are falsely accused of academic dishonesty. In this paper, we evaluate three different AI text detectors-Kirchenbauer et al. watermarks, ZeroGPT, and GPTZero-against human and AI-generated essays. We find that watermarking results in a high false positive rate, and that ZeroGPT has both high false positive and false negative rates. Further, we are able to significantly increase the false negative rate of all detectors by using ChatGPT 3.5 to paraphrase the original AI-generated texts, thereby effectively bypassing the detectors.
- Harald Baayen. Statistical Models for Word Frequency Distributions: A Linguistic Evaluation. Computers and the Humanities, 1992(26):347–363, December 1992.
- Universal Sentence Encoder, March 2018.
- On the Possibilities of AI-Generated Text Detection, April 2023.
- A Watermark for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning, 2023.
- On the Reliability of Watermarks for Large Language Models, June 2023.
- GPT Detectors Are Biased Against Non-Native English Writers, April 2023.
- DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, January 2023.
- The Regents of the University of Michigan. Michigan Corpus of Upper-level Student Papers, 2009.
- Deepfake Text Detection: Limitations and Opportunities. 2023.
- Can AI-Generated Text be Reliably Detected?, March 2023.
- Generalizing to Unseen Domains: A Survey on Domain Generalization. In Proceedings of the Thirtieth, pages 4627–4635, Montreal, Canada, 2021.
- Testing of Detection Tools for AI-Generated Text, June 2023.
- James Weichert (4 papers)
- Chinecherem Dimobi (1 paper)