Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Undetectable Watermarks for Language Models (2306.09194v1)

Published 25 May 2023 in cs.CR, cs.CL, and cs.LG

Abstract: Recent advances in the capabilities of LLMs such as GPT-4 have spurred increasing concern about our ability to detect AI-generated text. Prior works have suggested methods of embedding watermarks in model outputs, by noticeably altering the output distribution. We ask: Is it possible to introduce a watermark without incurring any detectable change to the output distribution? To this end we introduce a cryptographically-inspired notion of undetectable watermarks for LLMs. That is, watermarks can be detected only with the knowledge of a secret key; without the secret key, it is computationally intractable to distinguish watermarked outputs from those of the original model. In particular, it is impossible for a user to observe any degradation in the quality of the text. Crucially, watermarks should remain undetectable even when the user is allowed to adaptively query the model with arbitrarily chosen prompts. We construct undetectable watermarks based on the existence of one-way functions, a standard assumption in cryptography.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Scott Aaronson. My AI Safety Lecture for UT Effective Altruism. https://scottaaronson.blog/?p=6823, November 2022. Accessed May 2023.
  2. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pages 121–140. IEEE, 2021.
  3. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4, pages 185–200. Springer, 2001.
  4. Natural language watermarking and tamperproofing. In Information Hiding: 5th International Workshop, IH 2002 Noordwijkerhout, The Netherlands, October 7-9, 2002 Revised Papers 5, pages 196–212. Springer, 2003.
  5. Daria Beresneva. Computer-generated text detection using machine learning: A systematic review. In Natural Language Processing and Information Systems: 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Salford, UK, June 22-24, 2016, Proceedings 21, pages 421–426. Springer, 2016.
  6. On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
  7. Upper and lower bounds on black-box steganography. Journal of Cryptology, 22:365–394, 2009.
  8. Geoffrey A. Fowler. We tested a new chatgpt-detector for teachers. it flagged an innocent student. The Washington Post, April 2023.
  9. How to construct random functions. Journal of the ACM (JACM), 33(4):792–807, 1986.
  10. Planting undetectable backdoors in machine learning models. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 931–942. IEEE, 2022.
  11. Gltr: Statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043, 2019.
  12. A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, 1999.
  13. From weak to strong watermarking. In Theory of Cryptography: 4th Theory of Cryptography Conference, TCC 2007, Amsterdam, The Netherlands, February 21-24, 2007. Proceedings 4, pages 362–382. Springer, 2007.
  14. Provably secure steganography. IEEE Trans. Computers, 58(5):662–676, 2009.
  15. Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314, 2020.
  16. Svante Janson. Tail bounds for sums of geometric and exponential variables. Statistics & Probability Letters, 135:1–6, 2018.
  17. Kayla Jimenez. Professors are using chatgpt detector tools to accuse students of cheating. but what if the software is wrong? USA Today, April 2023.
  18. New ai classifier for indicating ai-written text. https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text, January 2023. Accessed May 2023.
  19. A watermark for large language models. CoRR, abs/2301.10226, 2023.
  20. Meteor: Cryptographically secure steganography for realistic distributions. In Yongdae Kim, Jong Kim, Giovanni Vigna, and Elaine Shi, editors, CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, pages 1529–1548. ACM, 2021.
  21. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
  22. Detecting fake content with relative entropy scoring. PAN, 8:27–31, 2008.
  23. Gpt detectors are biased against non-native english writers. arXiv preprint arXiv:2304.02819, 2023.
  24. Detectgpt: Zero-shot machine-generated text detection using probability curvature. CoRR, abs/2301.11305, 2023.
  25. Deeptextmark: Deep learning based text watermarking for detection of large language model generated text. arXiv preprint arXiv:2305.05773, 2023.
  26. OpenAI. tiktoken repository. https://github.com/openai/tiktoken, 2023. Accessed April 2023.
  27. Natural language watermarking via paraphraser-based lexical substitution. Artificial Intelligence, page 103859, 2023.
  28. Can ai-generated text be reliably detected? CoRR, abs/2303.11156, 2023.
  29. Edward Tian. gptzero update v1. https://gptzero.substack.com/p/gptzero-update-v1, January 2023. Accessed May 2023.
  30. Robust natural language watermarking through invariant features. arXiv preprint arXiv:2305.01904, 2023.
  31. Defending against neural fake news. Advances in neural information processing systems, 32, 2019.
Citations (102)

Summary

We haven't generated a summary for this paper yet.