Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models can be Guided to Evade AI-Generated Text Detection (2305.10847v6)

Published 18 May 2023 in cs.CL and cs.AI

Abstract: LLMs have shown remarkable performance in various tasks and have been extensively utilized by the public. However, the increasing concerns regarding the misuse of LLMs, such as plagiarism and spamming, have led to the development of multiple detectors, including fine-tuned classifiers and statistical methods. In this study, we equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors. We propose a novel Substitution-based In-Context example Optimization method (SICO) to automatically construct prompts for evading the detectors. SICO is cost-efficient as it requires only 40 human-written examples and a limited number of LLM inferences to generate a prompt. Moreover, once a task-specific prompt has been constructed, it can be universally used against a wide range of detectors. Extensive experiments across three real-world tasks demonstrate that SICO significantly outperforms the paraphraser baselines and enables GPT-3.5 to successfully evade six detectors, decreasing their AUC by 0.5 on average. Furthermore, a comprehensive human evaluation show that the SICO-generated text achieves human-level readability and task completion rates, while preserving high imperceptibility. Finally, we propose an ensemble approach to enhance the robustness of detectors against SICO attack. The code is publicly available at https://github.com/ColinLu50/Evade-GPT-Detector.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  3. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  4. Generating sentiment-preserving fake online reviews using neural language models and their human- and machine-based detection. In Leonard Barolli, Flora Amato, Francesco Moscato, Tomoya Enokido, and Makoto Takizawa, editors, Advanced Information Networking and Applications - Proceedings of the 34th International Conference on Advanced Information Networking and Applications, AINA-2020, Caserta, Italy, 15-17 April, volume 1151 of Advances in Intelligent Systems and Computing, pages 1341–1354. Springer, 2020. doi: 10.1007/978-3-030-44041-1_114. URL https://doi.org/10.1007/978-3-030-44041-1_114.
  5. Truthfulqa: Measuring how models mimic human falsehoods. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 3214–3252. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.acl-long.229. URL https://doi.org/10.18653/v1/2022.acl-long.229.
  6. Chris Stokel-Walker. Ai bot chatgpt writes smart essays-should academics worry? Nature, 2022.
  7. StackOverflow. Temporary policy: Chatgpt is banned, 2023. URL https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt-is-banned.
  8. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. CoRR, abs/2301.07597, 2023. doi: 10.48550/arXiv.2301.07597. URL https://doi.org/10.48550/arXiv.2301.07597.
  9. Release strategies and the social impacts of language models. CoRR, abs/1908.09203, 2019. URL http://arxiv.org/abs/1908.09203.
  10. Detectgpt: Zero-shot machine-generated text detection using probability curvature. CoRR, abs/2301.11305, 2023. doi: 10.48550/arXiv.2301.11305. URL https://doi.org/10.48550/arXiv.2301.11305.
  11. A watermark for large language models. CoRR, abs/2301.10226, 2023. doi: 10.48550/arXiv.2301.10226. URL https://doi.org/10.48550/arXiv.2301.10226.
  12. Edward Tian. Gptzero: an ai detector, 2023. URL https://gptzero.me/.
  13. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. CoRR, abs/2303.13408, 2023. doi: 10.48550/arXiv.2303.13408. URL https://doi.org/10.48550/arXiv.2303.13408.
  14. Can ai-generated text be reliably detected? CoRR, abs/2303.11156, 2023. doi: 10.48550/arXiv.2303.11156. URL https://doi.org/10.48550/arXiv.2303.11156.
  15. Dirk Hovy. The enemy in your own camp: How well can we detect statistically-generated fake reviews - an adversarial study. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Linguistics, 2016. doi: 10.18653/v1/p16-2057. URL https://doi.org/10.18653/v1/p16-2057.
  16. Defending against neural fake news. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 9051–9062, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/3e9f0fc9b2f89e043bc6233994dfcf76-Abstract.html.
  17. Detecting fake content with relative entropy scoring. In Benno Stein, Efstathios Stamatatos, and Moshe Koppel, editors, Proceedings of the ECAI’08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Patras, Greece, July 22, 2008, volume 377 of CEUR Workshop Proceedings. CEUR-WS.org, 2008. URL https://ceur-ws.org/Vol-377/paper4.pdf.
  18. Daria Beresneva. Computer-generated text detection using machine learning: A systematic review. In Elisabeth Métais, Farid Meziane, Mohamad Saraee, Vijayan Sugumaran, and Sunil Vadera, editors, Natural Language Processing and Information Systems - 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Salford, UK, June 22-24, 2016, Proceedings, volume 9612 of Lecture Notes in Computer Science, pages 421–426. Springer, 2016. doi: 10.1007/978-3-319-41754-7_43. URL https://doi.org/10.1007/978-3-319-41754-7_43.
  19. GLTR: statistical detection and visualization of generated text. In Marta R. Costa-jussà and Enrique Alfonseca, editors, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28 - August 2, 2019, Volume 3: System Demonstrations, pages 111–116. Association for Computational Linguistics, 2019. doi: 10.18653/v1/p19-3019. URL https://doi.org/10.18653/v1/p19-3019.
  20. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021, pages 121–140. IEEE, 2021. doi: 10.1109/SP40001.2021.00083. URL https://doi.org/10.1109/SP40001.2021.00083.
  21. The ethical need for watermarks in machine-generated language. CoRR, abs/2209.03118, 2022. doi: 10.48550/arXiv.2209.03118. URL https://doi.org/10.48550/arXiv.2209.03118.
  22. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1423. URL https://doi.org/10.18653/v1/n19-1423.
  23. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  24. Leveraging per image-token consistency for vision-language pre-training. CoRR, abs/2211.15398, 2022. doi: 10.48550/arXiv.2211.15398. URL https://doi.org/10.48550/arXiv.2211.15398.
  25. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  26. Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator. CoRR, abs/2206.08082, 2022. doi: 10.48550/arXiv.2206.08082. URL https://doi.org/10.48550/arXiv.2206.08082.
  27. Demystifying prompts in language models via perplexity estimation. CoRR, abs/2212.04037, 2022. doi: 10.48550/arXiv.2212.04037. URL https://doi.org/10.48550/arXiv.2212.04037.
  28. Learning to retrieve prompts for in-context learning. In Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 2655–2671. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.naacl-main.191. URL https://doi.org/10.18653/v1/2022.naacl-main.191.
  29. What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804, 2021.
  30. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  31. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
  32. George A Miller. WordNet: An electronic lexical database. MIT press, 1998.
  33. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints, art. arXiv:1606.05250, 2016.
  34. ELI5: long form question answering. In Anna Korhonen, David R. Traum, and Lluís Màrquez, editors, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 3558–3567. Association for Computational Linguistics, 2019. doi: 10.18653/v1/p19-1346. URL https://doi.org/10.18653/v1/p19-1346.
  35. Character-level convolutional networks for text classification. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 649–657, 2015. URL https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html.
  36. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019. URL http://arxiv.org/abs/1907.11692.
  37. Smaller language models are better black-box machine-generated text detectors. CoRR, abs/2305.09859, 2023. doi: 10.48550/arXiv.2305.09859. URL https://doi.org/10.48550/arXiv.2305.09859.
  38. OpenAI. Openai ai text classifier, January 2023. URL https://beta.openai.com/ai-text-classifier.
  39. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, 2023.
  40. Feature-rich part-of-speech tagging with a cyclic dependency network. In Marti A. Hearst and Mari Ostendorf, editors, Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, Edmonton, Canada, May 27 - June 1, 2003. The Association for Computational Linguistics, 2003. URL https://aclanthology.org/N03-1033/.
  41. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 1864–1874. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.findings-acl.146. URL https://doi.org/10.18653/v1/2022.findings-acl.146.
  42. MTEB: massive text embedding benchmark. In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 2006–2029. Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.eacl-main.148.
  43. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
Citations (41)

Summary

We haven't generated a summary for this paper yet.