Papers
Topics
Authors
Recent
Search
2000 character limit reached

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios

Published 31 Oct 2024 in cs.CL and cs.AI | (2410.23746v3)

Abstract: Detecting text generated by LLMs is of great recent interest. With zero-shot methods like DetectGPT, detection capabilities have reached impressive levels. However, the reliability of existing detectors in real-world applications remains underexplored. In this study, we present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task. We collected human-written datasets from domains where LLMs are particularly prone to misuse. Using popular LLMs, we generated data that better aligns with real-world applications. Unlike previous studies, we employed heuristic rules to create adversarial LLM-generated text, simulating various prompts usages, human revisions like word substitutions, and writing noises like spelling mistakes. Our development of DetectRL reveals the strengths and limitations of current SOTA detectors. More importantly, we analyzed the potential impact of writing styles, model types, attack methods, the text lengths, and real-world human writing factors on different types of detectors. We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios, evolving with advanced attack methods, thus providing more stressful evaluation to drive the development of more efficient detectors. Data and code are publicly available at: https://github.com/NLP2CT/DetectRL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Automatic detection of generated text is easiest when humans are fooled. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 1808–1822. Association for Computational Linguistics, 2020.
  2. The age of synthetic realities: Challenges and opportunities. CoRR, abs/2306.11503, 2023.
  3. CHEAT: A large-scale dataset for detecting chatgpt-written abstracts. CoRR, abs/2304.12008, 2023.
  4. A survey on llm-generated text detection: Necessity, methods, and future directions. CoRR, abs/2310.14724, 2023.
  5. Do language models plagiarize? In Ying Ding, Jie Tang, Juan F. Sequeda, Lora Aroyo, Carlos Castillo, and Geert-Jan Houben, editors, Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 3637–3647. ACM, 2023.
  6. Threat scenarios and best practices to detect neural fake news. In Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na, editors, Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 1233–1249. International Committee on Computational Linguistics, 2022.
  7. C Stokel-Walker. Ai bot chatgpt writes smart essays—should professors worry?[published online ahead of print december 9, 2022]. Nature News, 2022.
  8. TURINGBENCH: A benchmark environment for turing test in the age of neural text generation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, pages 2001–2016. Association for Computational Linguistics, 2021.
  9. Mgtbench: Benchmarking machine-generated text detection. CoRR, abs/2303.14822, 2023.
  10. Multitude: Large-scale multilingual machine-generated text detection benchmark. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 9960–9987, 2023.
  11. MAGE: machine-generated text detection in the wild. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, pages 36–53, 2024.
  12. M4: multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Volume 1: Long Papers, St. Julian’s, Malta, March 17-22, 2024, pages 1369–1407, 2024.
  13. OpenAI Blog. Introducing chatgpt. https://openai.com/index/chatgpt/, 2023.
  14. Palm 2 technical report. CoRR, abs/2305.10403, 2023.
  15. Anthropic Blog. Releasing claude instant 1.2. https://www.anthropic.com/news/releasing-claude-instant-1-2, 2023.
  16. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023.
  17. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 1797–1807. Association for Computational Linguistics, 2018.
  18. Hierarchical neural story generation. In Iryna Gurevych and Yusuke Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 889–898. Association for Computational Linguistics, 2018.
  19. Character-level convolutional networks for text classification. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 649–657, 2015.
  20. A survey on detection of llms-generated content. CoRR, abs/2310.15654, 2023.
  21. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  22. Large language models can be guided to evade ai-generated text detection. CoRR, abs/2305.10847, 2023.
  23. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. CoRR, abs/2303.13408, 2023.
  24. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 8018–8025. AAAI Press, 2020.
  25. Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018, pages 50–56. IEEE Computer Society, 2018.
  26. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In Qun Liu and David Schlangen, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16-20, 2020, pages 119–126. Association for Computational Linguistics, 2020.
  27. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 24950–24962. PMLR, 2023.
  28. Release strategies and the social impacts of language models. CoRR, abs/1908.09203, 2019.
  29. Detecting fake content with relative entropy scoring. In Benno Stein, Efstathios Stamatatos, and Moshe Koppel, editors, Proceedings of the ECAI’08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Patras, Greece, July 22, 2008, volume 377 of CEUR Workshop Proceedings. CEUR-WS.org, 2008.
  30. GLTR: statistical detection and visualization of generated text. In Marta R. Costa-jussà and Enrique Alfonseca, editors, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28 - August 2, 2019, Volume 3: System Demonstrations, pages 111–116. Association for Computational Linguistics, 2019.
  31. Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 12395–12412. Association for Computational Linguistics, 2023.
  32. Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature. CoRR, abs/2310.05130, 2023.
  33. Beat llms at their own game: Zero-shot llm-generated text detection via querying chatgpt. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 7470–7483, 2023.
  34. DNA-GPT: divergent n-gram analysis for training-free detection of gpt-generated text. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024, 2024.
  35. Spotting llms with binoculars: Zero-shot detection of machine-generated text. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024, 2024.
  36. Klue: Korean language understanding evaluation, 2021.
  37. Unsupervised cross-lingual representation learning at scale. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 8440–8451. Association for Computational Linguistics, 2020.
  38. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, March 2021.
  39. OpenAI. Gpt-4o mini: advancing cost-efficient intelligence. OpenAI blog, 2024.
  40. Deepfake text detection in the wild. CoRR, abs/2305.13242, 2023.
  41. Who wrote this? the key to zero-shot llm-generated text detection is gecscore. CoRR, abs/2405.04286, 2024.
  42. Llama 2: Open foundation and fine-tuned chat models, 2023.
  43. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.
  44. Prefix text as a yarn: Eliciting non-english alignment in foundation language model. In Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, pages 12131–12145, 2024.
  45. A survey of large language models. CoRR, abs/2303.18223, 2023.
  46. Is chatgpt a highly fluent grammatical error correction system? A comprehensive evaluation. CoRR, abs/2304.01746, 2023.
  47. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. CoRR, abs/2301.07597, 2023.
  48. Natural language generation for advertising: A survey. CoRR, abs/2306.12719, 2023.
  49. Fake news detection with generated comments for news articles. In 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES), pages 85–90. IEEE, 2020.
  50. Wordcraft: Story writing with large language models. In Giulio Jacucci, Samuel Kaski, Cristina Conati, Simone Stumpf, Tuukka Ruotsalo, and Krzysztof Gajos, editors, IUI 2022: 27th International Conference on Intelligent User Interfaces, Helsinki, Finland, March 22 - 25, 2022, pages 841–852. ACM, 2022.
  51. Programming is hard - or at least it used to be: Educational opportunities and challenges of AI code generation. In Maureen Doyle, Ben Stephenson, Brian Dorn, Leen-Kiat Soh, and Lina Battestilli, editors, Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Volume 1, SIGCSE 2023, Toronto, ON, Canada, March 15-18, 2023, pages 500–506. ACM, 2023.
  52. Teo Susnjak. Chatgpt: The end of online exam integrity? CoRR, abs/2212.09292, 2022.
  53. Chatlaw: Open-source legal large language model with integrated external knowledge bases. CoRR, abs/2306.16092, 2023.
  54. Large language models to identify social determinants of health in electronic health records. npj Digit. Medicine, 7(1), 2024.
  55. FOCUS: forging originality through contrastive use in self-plagiarism for language models. In Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, pages 14432–14447, 2024.
  56. Self-consuming generative models go MAD. CoRR, abs/2307.01850, 2023.
  57. Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora’s box has been opened. Journal of Medical Internet Research, 25:e46924, 2023.
  58. What chatgpt and generative ai mean for science. Nature, 614(7947):214–216, 2023.
  59. Generative ai entails a credit–blame asymmetry. Nature Machine Intelligence, pages 1–4, 2023.
  60. Towards possibilities & impossibilities of ai-generated text detection: A survey. CoRR, abs/2310.15264, 2023.
  61. Red teaming language model detectors with language models. CoRR, abs/2305.19713, 2023.
  62. Tweepfake: about detecting deepfake tweets. CoRR, abs/2008.00036, 2020.
  63. Defending against neural fake news. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 9051–9062, 2019.
  64. Argugpt: evaluating, understanding and identifying argumentative essays generated by GPT models. CoRR, abs/2304.07666, 2023.
  65. Deepfake text detection in the wild, 2023.
  66. M4: multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. CoRR, abs/2305.14902, 2023.
  67. OpenAI. Introducing chatgpt, 2022.
  68. Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022.
  69. Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
  70. Rudolph Flesch. A new readability yardstick. Journal of applied psychology, 32(3):221, 1948.
  71. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
  72. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 2 likes about this paper.