Towards Optimal Statistical Watermarking

Published 13 Dec 2023 in cs.LG, cs.CL, cs.CR, cs.IT, math.IT, and stat.ML | (2312.07930v3)

Abstract: We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of $n$ tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of $\Theta(h^{-1} \log (1/h))$ with respect to the average entropy per token $h$ highlights potentials for improvement from the rate of $h^{-2}$ in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.

Abstract PDF HTML Upgrade to Chat

Authors (7)

References (25)

Citations (10)

View on Semantic Scholar

Summary

The paper introduces a hypothesis testing framework that optimally trades off Type I and Type II errors for watermark detection.
It establishes upper and lower bounds in i.i.d. settings, leading to more resource-efficient differentiation between machine-generated and human texts.
It extends the method to model-agnostic and robust watermarking, achieving near-optimal error rates even under text perturbations.

Introduction to Statistical Watermarking

Statistical watermarking addresses the challenge of determining if a piece of text was produced by a human or by a LLM. The process involves embedding signals into the text that can later be detected to reveal its source. A good watermarking scheme is one that is distortion-free (maintains the text's natural distribution), agnostic (detects the watermark without knowing the model or prompt used), and robust (detects even when text is slightly perturbed). Previous watermarking methods have lacked systematic mathematical approaches, making comprehensive evaluations difficult.

Hypothesis Testing and Watermarking

This paper presents a novel formulation of statistical watermarking framed as a hypothesis testing problem, trading off Type I and Type II errors. The main contributions include defining a watermarking scheme as a hypothesis test with a random rejection region, characterizing the Uniformly Most Powerful (UMP) watermarking scheme, and identifying the optimal type II error. This research describes an innovative watermarking strategy using pseudo-random generators coupled with a hypothesis testing framework, which supersedes prior methods and enhances their theoretical bases.

Performance in i.i.d. Settings

When watermarking outputs are sequences of independent and identically distributed (i.i.d.) tokens, the study successfully establishes upper and lower bounds for this context. The findings demonstrate increased efficiency in resource usage, highlighting a significantly improved rate compared to prior research. The watermarking scheme optimized to i.i.d. settings is more resource-effective, needing fewer tokens to accurately differentiate between machine-generated and independent outputs.

Extensions to Watermarking

The paper further explores model-agnostic watermarking where the watermark is detectable without prior knowledge of the model distribution, thus valuable in practical applications. It sets minimax bounds for the increase in Type II error within this framework. Additionally, it also examines robust watermarking, characterized as a watermarking problem subjected to a class of user perturbations, and outlines optimal Type II error rates through linear programming.

In conclusion, the research implications are profound, establishing near-optimal rates for watermarking in the considered settings and offering a robust, statistically sound solution to the watermarking problem for LLMs. The foundations laid by this paper have significant positive impacts, potentially reducing the misuse of LLMs by allowing for the traceability of their outputs.

Markdown Report Issue