Watermarking Language Models for Many Adaptive Users (2405.11109v2)
Abstract: We study watermarking schemes for LLMs with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a LLM more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust). Importantly, our scheme provides both zero-bit and multi-user assurances at the same time. It detects shorter snippets just as well as the original scheme, and traces longer excerpts to individuals. The main technical component is a construction of message-embedding watermarks from zero-bit watermarks. Ours is the first generic reduction between watermarking schemes for LLMs. A challenge for such reductions is the lack of a unified abstraction for robustness -- that marked text is detectable even after edits. We introduce a new unifying abstraction called AEB-robustness. AEB-robustness provides that the watermark is detectable whenever the edited text "approximates enough blocks" of model-generated output.
- Scott Aaronson. My AI safety lecture for ut effective altruism, November 2022.
- Dan Geer Bob Gleichauf. Digital watermarks are not ready for large language models. Lawfare, 2024.
- Robust fingerprinting codes: a near optimal construction. In Proceedings of the Tenth Annual ACM Workshop on Digital Rights Management, DRM ’10, page 3–12, New York, NY, USA, 2010. Association for Computing Machinery.
- D. Boneh and J. Shaw. Collusion-secure fingerprinting for digital data. IEEE Transactions on Information Theory, 44(5):1897–1905, 1998.
- Christian Cachin. An information-theoretic model for steganography. Cryptology ePrint Archive, Report 2000/028, 2000. https://eprint.iacr.org/2000/028.
- Pseudorandom error-correcting codes. Cryptology ePrint Archive, Paper 2024/235, 2024. https://eprint.iacr.org/2024/235.
- Undetectable watermarks for language models. Cryptology ePrint Archive, Paper 2023/763, 2023. https://eprint.iacr.org/2023/763.
- Towards better statistical understanding of watermarking llms, 2024.
- Publicly detectable watermarking for language models. Cryptology ePrint Archive, Paper 2023/1661, 2023. https://eprint.iacr.org/2023/1661.
- Generating steganographic images via adversarial training. Advances in neural information processing systems, 30, 2017.
- Nicholas J. Hopper. Toward a theory of steganography. Technical report, 2004.
- White House. Blueprint for an AI Bill of Rights. Office of Science and Technology Policy, 2023.
- White House. Fact sheet: Biden-harris administration secures voluntary commitments from leading artificial intelligence companies to manage the risks posed by ai. Statements and Releases, 2023.
- White House. Fact sheet: President biden issues executive order on safe, secure, and trustworthy artificial intelligence. Statements and Releases, 2023.
- Watermark-based detection and attribution of ai-generated content, 2024.
- A watermark for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17061–17084. PMLR, 23–29 Jul 2023.
- On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634, 2023.
- Meteor: Cryptographically secure steganography for realistic distributions. In Giovanni Vigna and Elaine Shi, editors, ACM CCS 2021: 28th Conference on Computer and Communications Security, pages 1529–1548, Virtual Event, Republic of Korea, November 15–19, 2021. ACM Press.
- Robust distortion-free watermarks for language models, 2023.
- A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules, 2024.
- Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.
- An improvement of tardos’s collusion-secure fingerprinting codes with very short lengths. In Serdar Boztaş and Hsiao-Feng (Francis) Lu, editors, Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, pages 80–89, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
- Topic-based watermarks for llm-generated text, 2024.
- Attacking llm watermarks by exploiting their strengths, 2024.
- Provably robust multi-bit watermarking for ai-generated text via error correction code, 2024.
- Pushmeet Kohli Sven Gowal. Identifying ai-generated images with synthid. Google Deepmind, 2023.
- Siddarth Srinivasan. Detecting AI fingerprints: A guide to watermarking and beyond. Brookings, 2024.
- Gábor Tardos. Optimal probabilistic fingerprint codes. J. ACM, 55(2), may 2008.
- Tree-rings watermarks: Invisible fingerprints for diffusion images. Advances in Neural Information Processing Systems, 36, 2024.
- Learning to watermark llm-generated text via reinforcement learning, 2024.
- Provable robust watermarking for ai-generated text, 2023.
- Watermarks in the sand: Impossibility of strong watermarking for generative models, 2023.
- Hidden: Hiding data with deep networks. In Proceedings of the European conference on computer vision (ECCV), pages 657–672, 2018.