Towards Optimal Statistical Watermarking
Abstract: We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of $n$ tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of $\Theta(h{-1} \log (1/h))$ with respect to the average entropy per token $h$ highlights potentials for improvement from the rate of $h{-2}$ in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.
- S. Aaronson. My ai safety lecture for ut effective altruism. Shtetl-Optimized: The blog of Scott Aaronson. Retrieved on September, 11:2023, 2022.
- S. Aaronson. Watermarking gpt outputs. Scott Aaronson, 2022.
- S. Abdelnabi and M. Fritz. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pages 121–140. IEEE, 2021.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
- Three bricks to consolidate watermarks for large language models. arXiv preprint arXiv:2308.00113, 2023.
- Watermarking conditional text generation for ai detection: Unveiling challenges and a semantic-aware watermark remedy. arXiv preprint arXiv:2307.13808, 2023.
- A review of text watermarking: theory, methods, and applications. IEEE Access, 6:8011–8028, 2018.
- A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023.
- On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634, 2023.
- Outfox: Llm-generated essay detection through in-context learning with adversarially generated examples. arXiv preprint arXiv:2307.11729, 2023.
- Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
- A private watermark for large language models. arXiv preprint arXiv:2307.16230, 2023.
- OpenAI. Gpt-4 technical report, 2023.
- Fine-grain watermarking for intellectual property protection. EURASIP Journal on Information Security, 2019:1–20, 2019.
- Embarrassingly simple text watermarks. arXiv preprint arXiv:2310.08920, 2023.
- V. Strassen. The existence of probability measures with given marginals. The Annals of Mathematical Statistics, 36(2):423–439, 1965.
- F. Topsøe. Bounds for entropy and divergence for distributions over a two-element set. J. Ineq. Pure Appl. Math, 2(2), 2001.
- Watermarking the outputs of structured prediction with an application in statistical machine translation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1363–1372, Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics.
- J. Vincent. AI-generated answers temporarily banned on coding q&a site stack overflow. The Verge, 5, 2022.
- Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992, 2023.
- Towards code watermarking with dual-channel transformations. arXiv preprint arXiv:2309.00860, 2023.
- Tracing text provenance via context-aware lexical substitution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11613–11621, 2022.
- Advancing beyond identification: Multi-bit watermark for language models. arXiv preprint arXiv:2308.00221, 2023.
- Watermarks in the sand: Impossibility of strong watermarking for generative models. arXiv preprint arXiv:2311.04378, 2023.
- Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.