A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules (2404.01245v3)
Abstract: Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by LLMs, also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key -- provided by the LLM to the verifier -- to enable controlling the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks -- one of which has been internally implemented at OpenAI -- and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.
- S. Aaronson. Watermarking of large language models. https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17, August 2023.
- GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- A learning algorithm for Boltzmann machines. Cognitive science, 9(1):147–169, 1985.
- M. Albert. Concentration inequalities for randomly permuted sums. In High Dimensional Probability VIII: The Oaxaca Volume, pages 341–383. Springer, 2019.
- Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4, pages 185–200. Springer, 2001.
- R. R. Bahadur. On the asymptotic efficiency of tests and estimates. Sankhyā: The Indian Journal of Statistics, pages 229–252, 1960.
- Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature. arXiv preprint arXiv:2310.05130, 2023.
- B. Barak. An intensive introduction to cryptography, lectures notes for Harvard CS 127. https://intensecrypto.org/public/index.html, Fall 2021.
- D. Bartz and K. Hu. OpenAI, Google, others pledge to watermark AI content for safety, White house says. https://www.reuters.com/technology/openai-google-others-pledge-watermark-ai-content-safety-white-house-2023-07-21/, 2023. Accessed: 2023-10-03.
- J. Biden. Fact sheet: President Biden issues executive order on safe, secure, and trustworthy artificial intelligence. https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/, October 2023. Accessed: 2023-11-01.
- Language models are few-shot learners. In Advances in neural information processing systems, volume 33, pages 1877–1901, 2020.
- Towards better statistical understanding of watermarking LLMs. arXiv preprint arXiv:2403.13027, 2024.
- Watermarking security: Theory and practice. IEEE Transactions on signal processing, 53(10):3976–3987, 2005.
- On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
- Digital watermarking and steganography. Morgan kaufmann, 2007.
- Under the surface: Tracking the artifactuality of LLM-generated data. arXiv preprint arXiv:2401.14698, 2024.
- A. Dembo and O. Zeitouni. Large deviations techniques and applications, volume 38. Springer Science & Business Media, 2009.
- L. Devroye. Nonuniform random variate generation. Handbooks in operations research and management science, 13:83–121, 2006.
- Asymptotic evaluation of certain Markov process expectations for large time. IV. Communications on pure and applied mathematics, 36(2):183–212, 1983.
- R. Durrett. Probability: Theory and Examples (Edition 4.1). Cambridge University Press, 2013.
- Three bricks to consolidate watermarks for large language models. arXiv preprint arXiv:2308.00113, 2023.
- GLTR: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, 2019.
- E. Giboulot and F. Teddy. WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off. arXiv preprint arXiv:2403.04808, 2024.
- GPTZero. GPTZero: More than an AI detector preserve what’s human. https://gptzero.me/, 2023.
- E. J. Gumbel. Statistical theory of extreme values and some practical applications: A series of lectures, volume 33. US Government Printing Office, 1948.
- Array programming with NumPy. Nature, 585(7825):357–362, 2020.
- J.-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimization algorithms I: Fundamentals, volume 305. Springer science & business media, 1996.
- Unbiased watermark for large language models. arXiv preprint arXiv:2310.10669, 2023.
- Towards optimal statistical watermarking. arXiv preprint arXiv:2312.07930, 2023.
- Categorical reparameterization with Gumbel-Softmax. In International Conference on Learning Representations, 2016.
- J. Katz and Y. Lindell. Introduction to Modern Cryptography. Chapman & Hall CRC, 2008.
- A watermark for large language models. In International Conference on Machine Learning, volume 202, pages 17061–17084, 2023.
- On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634, 2023.
- Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. In Advances in Neural Information Processing Systems, volume 36, 2024.
- Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
- GPT detectors are biased against non-native english writers. In ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models, 2023.
- Near-optimal algorithms for minimax optimization. In Conference on Learning Theory, pages 2738–2779. PMLR, 2020.
- Y. Liu and Y. Bu. Adaptive text watermark for large language models. arXiv preprint arXiv:2401.13927, 2024.
- A* sampling. In Advances in neural information processing systems, volume 27, 2014.
- Large language models challenge the future of higher education. Nature Machine Intelligence, 5(4):333–334, 2023.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023.
- M. E. O’neill. PCG: A family of simple fast space-efficient statistically good algorithms for random number generation. ACM Transactions on Mathematical Software, 2014.
- OpenAI. ChatGPT: Optimizing language models for dialogue. http://web.archive.org/web/20230109000707/https://openai.com/blog/chatgpt/, Jan 2023.
- C. Paar and J. Pelzl. Understanding cryptography: A textbook for students and practitioners. Springer Science & Business Media, 2009.
- G. Papandreou and A. L. Yuille. Perturb-and-map random fields: Using discrete optimization to learn and sample from energy models. In 2011 International Conference on Computer Vision, pages 193–200. IEEE, 2011.
- Mark my words: Analyzing and evaluating language model watermarks. arXiv preprint arXiv:2312.00273, 2023.
- Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- H. Rahimian and S. Mehrotra. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization, 3:1–85, 2022.
- Can AI-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- B. Schneier. Applied Cryptography. John Wiley & Sons, 1996.
- The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493, 2023.
- K. Starbird. Disinformation’s spread: Bots, trolls and all of us. Nature, 571(7766):449–450, 2019.
- D. R. Stinson. Cryptography: Theory and practice. Chapman & Hall CRC, 2005.
- C. Stokel-Walker. AI bot ChatGPT writes smart essays—Should professors worry? Nature News, 2022.
- The hiding virtues of ambiguity: Quantifiably resilient watermarking of natural language text through synonym substitutions. In Proceedings of the 8th workshop on Multimedia and security, pages 164–174, 2006.
- LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Intrinsic dimension estimation for robust detection of AI-generated texts. In Advances in Neural Information Processing Systems, volume 36, 2024.
- A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- Attention is all you need. In Advances in neural information processing systems, volume 30, 2017.
- A. J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 8(10):127–128, 1974.
- A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS), 3(3):253–256, 1977.
- Testing of detection tools for AI-generated text. International Journal for Educational Integrity, 19(1):26, 2023.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
- DiPmark: A stealthy, efficient and resilient watermark for large language models. arXiv preprint arXiv:2310.07710, 2023.
- Sheared LLaMA: Accelerating language model pre-training via structured pruning. In International Conference on Learning Representations, 2023.
- DNA-GPT: Divergent n-gram analysis for training-free detection of GPT-generated text. arXiv preprint arXiv:2305.17359, 2023.
- Defending against neural fake news. In Advances in neural information processing systems, volume 32, 2019.
- ZeroGPT. ZeroGPT: Trusted GPT-4, ChatGPT and AI detector tool by ZeroGPT. https://www.zerogpt.com/, 2023.
- Watermarks in the sand: Impossibility of strong watermarking for generative models. arXiv preprint arXiv:2311.04378, 2023.
- OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Provable robust watermarking for AI-generated text. In International Conference on Learning Representations, 2024.
- Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs. arXiv preprint arXiv:2402.05864, 2024.
- Security theory and attack analysis for text watermarking. In International Conference on E-Business and Information System Security, pages 1–6. IEEE, 2009.
- G. K. Zipf. Human behavior and the principle of least effort: An introduction to human ecology. Ravenio books, 2016.