Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack (2404.01907v1)

Published 2 Apr 2024 in cs.CL, cs.CR, and cs.LG

Abstract: With the development of LLMs, detecting whether text is generated by a machine becomes increasingly challenging in the face of malicious use cases like the spread of false information, protection of intellectual property, and prevention of academic plagiarism. While well-trained text detectors have demonstrated promising performance on unseen test data, recent research suggests that these detectors have vulnerabilities when dealing with adversarial attacks such as paraphrasing. In this paper, we propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection. We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness against such attacks. The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content. Furthermore, we explore the prospect of improving the model's robustness over iterative adversarial learning. Although some improvements in model robustness are observed, practical applications still face significant challenges. These findings shed light on the future development of AI-text detectors, emphasizing the need for more accurate and robust detection methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Palm 2 technical report. CoRR, abs/2305.10403.
  2. Real or fake? learning to discriminate machine from human generated text. CoRR, abs/1906.03351.
  3. A drop of ink makes a million think: The spread of false information in large language models. CoRR, abs/2305.04812.
  4. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 2397–2430. PMLR.
  5. Universal sentence encoder for english. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pages 169–174. Association for Computational Linguistics.
  6. Efficient detection of llm-generated texts with a bayesian surrogate model. CoRR, abs/2305.16617.
  7. Anthony M. DiGiorgio and Jesse M. Ehrenfeld. 2023. Artificial intelligence in medicine & chatgpt: De-tether the physician. J. Medical Syst., 47(1):32.
  8. Emilio Ferrara. 2023. Should chatgpt be biased? challenges and risks of bias in large language models. CoRR, abs/2304.03738.
  9. GLTR: statistical detection and visualization of generated text. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28 - August 2, 2019, Volume 3: System Demonstrations, pages 111–116. Association for Computational Linguistics.
  10. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. CoRR, abs/2301.07597.
  11. Mgtbench: Benchmarking machine-generated text detection. CoRR, abs/2303.14822.
  12. RADAR: robust ai-text detection via adversarial learning. CoRR, abs/2307.03838.
  13. A watermark for large language models. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 17061–17084. PMLR.
  14. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. CoRR, abs/2303.13408.
  15. Detecting fake content with relative entropy scoring. In Proceedings of the ECAI’08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Patras, Greece, July 22, 2008, volume 377 of CEUR Workshop Proceedings. CEUR-WS.org.
  16. Solving the self-regulated learning problem: Exploring the performance of chatgpt in mathematics. In Innovative Technologies and Learning - 6th International Conference, ICITL 2023, Porto, Portugal, August 28-30, 2023, Proceedings, volume 14099 of Lecture Notes in Computer Science, pages 77–86. Springer.
  17. GPT detectors are biased against non-native english writers. Patterns, 4(7):100779.
  18. A private watermark for large language models. CoRR, abs/2307.16230.
  19. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. CoRR, abs/2304.01852.
  20. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  21. Check me if you can: Detecting chatgpt-generated academic writing using checkgpt. CoRR, abs/2306.05524.
  22. Sources of hallucination by large language models on inference tasks. CoRR, abs/2305.14552.
  23. Smaller language models are better black-box machine-generated text detectors. CoRR, abs/2305.09859.
  24. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 24950–24962. PMLR.
  25. Counter-fitting word vectors to linguistic constraints. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 142–148. The Association for Computational Linguistics.
  26. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  27. Hidden killer: Invisible textual backdoor attacks with syntactic trigger. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 443–453. Association for Computational Linguistics.
  28. Can ai-generated text be reliably detected? CoRR, abs/2303.11156.
  29. Red teaming language model detectors with language models. CoRR, abs/2305.19713.
  30. Release strategies and the social impacts of language models. CoRR, abs/1908.09203.
  31. Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text. CoRR, abs/2306.05540.
  32. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings.
  33. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  34. Authorship attribution for neural text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 8384–8395. Association for Computational Linguistics.
  35. Colin G. West. 2023. AI and the FCI: can chatgpt project an understanding of introductory physics? CoRR, abs/2303.01067.
  36. PRADA: practical black-box adversarial attacks against neural ranking models. ACM Trans. Inf. Syst., 41(4):89:1–89:27.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ying Zhou (85 papers)
  2. Ben He (37 papers)
  3. Le Sun (111 papers)
Citations (2)