Raidar: geneRative AI Detection viA Rewriting (2401.12970v2)
Abstract: We find that LLMs are more likely to modify human-written text than AI-generated text when tasked with rewriting. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. We dubbed our geneRative AI Detection viA Rewriting method Raidar. Raidar significantly improves the F1 detection scores of existing AI content detection models -- both academic and commercial -- across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Operating solely on word symbols without high-dimensional features, our method is compatible with black box LLMs, and is inherently robust on new content. Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
- Chatgpt: Optimizing language models for dialogue, 2023. URL https://chat.openai.com.
- Anthropic, 2023. URL https://www.anthropic.com/product.
- Harnessing large language models to simulate realistic human responses to social engineering attacks: A case study. International Journal of Cybersecurity Intelligence & Cybercrime, 6(2):21–49, 2023.
- Real or fake? learning to discriminate machine from human generated text. arXiv preprint arXiv:1906.03351, 2019.
- Guiding the release of safer e2e conversational ai through value sensitive design. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 2022.
- Joseph Berkson. Application of the logistic function to bio-assay. Journal of the American Statistical Association, 39(227):357–365, 1944.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
- Evaluating large language models trained on code. 2021.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, 2016.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Chatting and cheating: Ensuring academic integrity in the era of chatgpt. Innovations in Education and Teaching International, pp. 1–12, 2023.
- Is gpt-3 text indistinguishable from human text? scarecrow: A framework for scrutinizing machine text. arXiv preprint arXiv:2107.01294, 2021.
- Tweepfake: About detecting deepfake tweets. Plos one, 16(5):e0251415, 2021.
- Gltr: Statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043, 2019.
- Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650, 2019.
- Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314, 2020.
- Is bert really robust? natural language attack on text classification and entailment. arXiv preprint arXiv:1907.11932, 2019.
- Exploiting programmatic behavior of llms: Dual-use through standard security attacks. arXiv preprint arXiv:2302.05733, 2023.
- A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
- Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707–710, 1966.
- Pretrained language models for text generation: A survey. arXiv preprint arXiv:2201.05273, 2022.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Gpt detectors are biased against non-native english writers. arXiv preprint arXiv:2304.02819, 2023.
- Smaller language models are better black-box machine-generated text detectors. arXiv preprint arXiv:2305.09859, 2023.
- The threat of offensive ai to organizations. Computers & Security, pp. 103006, 2022.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023.
- On the risk of misinformation pollution with large language models. arXiv preprint arXiv:2305.13661, 2023.
- Asleep at the keyboard? assessing the security of github copilot’s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 754–768. IEEE, 2022.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pp. 28492–28518. PMLR, 2023.
- Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- The curse of recursion: Training on generated data makes models forget. arXiv preprint arxiv:2305.17493, 2023.
- An empirical study of code smells in transformer-based code generation techniques. In 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 71–82. IEEE, 2022.
- Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019.
- The science of detecting llm-generated texts. arXiv preprint arXiv:2303.07205, 2023.
- E Tian, 2023. URL https://gptzero.me.
- Ghostbuster: Detecting text ghostwritten by large language models. arXiv preprint arXiv:2305.15047, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Recurrentgpt: Interactive generation of (arbitrarily) long text. arXiv preprint arXiv:2305.13304, 2023.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
- Chengzhi Mao (38 papers)
- Carl Vondrick (93 papers)
- Hao Wang (1119 papers)
- Junfeng Yang (80 papers)