Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices (2402.16187v3)

Published 25 Feb 2024 in cs.CR, cs.CL, and cs.LG

Abstract: Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack -- leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Scott Aaronson. Watermarking of large language models. https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17, 2023.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
  4. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  5. Publicly detectable watermarking for language models. Cryptology ePrint Archive, 2023.
  6. On the learnability of watermarks for language models. arXiv preprint arXiv:2312.04469, 2023.
  7. Emil Julius Gumbel. Statistical theory of extreme values and some practical applications: a series of lectures, volume 33. US Government Printing Office, 1948.
  8. Unbiased watermark for large language models. arXiv preprint arXiv:2310.10669, 2023.
  9. Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1875–1885, 2018.
  10. A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17061–17084. PMLR, 23–29 Jul 2023.
  11. On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634, 2023.
  12. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  13. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
  14. Towards document-level paraphrase generation with sentence rewriting and reordering. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1033–1044, 2021.
  15. Paraphrase generation with deep reinforcement learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3865–3878, 2018.
  16. Detectgpt: zero-shot machine-generated text detection using probability curvature. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
  17. Stealing the decoding algorithms of language models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1835–1849, 2023.
  18. OpenAI. Chatgpt: Optimizing language models for dialogue. OpenAI blog, https://openai.com/blog/chatgpt, 2022.
  19. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  20. OpenAI. Openai moderation endpoint. https://platform.openai.com/docs/guides/moderation, 2023.
  21. OpenAI. Openai usage policies. https://openai.com/policies/usage-policies, 2023.
  22. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  23. Mark my words: Analyzing and evaluating language model watermarks. arXiv preprint arXiv:2312.00273, 2023.
  24. Release strategies and the social impacts of language models, 2019.
  25. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  26. Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
  27. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  28. Dipmark: A stealthy, efficient and resilient watermark for large language models. arXiv preprint arXiv:2310.07710, 2023.
  29. Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992, 2023.
  30. Provable robust watermarking for AI-generated text. In The Twelfth International Conference on Learning Representations, 2024.
  31. Watermarks in the sand: Impossibility of strong watermarking for generative models. arXiv preprint arXiv:2311.04378, 2023.
  32. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Qi Pang (13 papers)
  2. Shengyuan Hu (14 papers)
  3. Wenting Zheng (8 papers)
  4. Virginia Smith (68 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com