Refined Detection for Gumbel Watermarking

Published 31 Mar 2026 in cs.LG, cs.CR, and stat.ML | (2603.30017v1)

Abstract: We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.

Abstract PDF Upgrade to Chat

Authors (1)

Tor Lattimore

Summary

The paper introduces a novel statistical detection method using a truncated power-law test to identify Gumbel watermarks without accessing the underlying LLM.
The paper establishes upper and lower bounds on sample complexity based on entropy conditions, ensuring near-optimal performance under varying token distributions.
The paper demonstrates improved efficiency over traditional exponential detectors, reducing detection time in high-entropy regimes while preserving non-distortionary properties.

Refined Detection for Gumbel Watermarking

Overview

This work provides a new statistical detection mechanism for the Gumbel watermarking scheme in LLMs, enhancing the efficiency of watermark detection while preserving the non-distortionary property. The proposed detection method exhibits near-optimal sample complexity in a problem-dependent sense, as established by new upper and lower bounds under precise entropy-based conditions on the sequence of next-token distributions.

Gumbel Watermarking Framework

The Gumbel watermarking approach operates by leveraging cryptographically seeded pseudorandomness to insert watermarks without altering the marginal distribution of the generated text. For each token $A_t$ , a pseudorandom uniform $(U_{t,a})_{a \in \Sigma}$ is generated using a secret key and a hash over preceding tokens; the token is then sampled as

$A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$

This method ensures the output distribution remains $P_t$ while enabling later verification by reconstructing the pseudorandom variables using the secret key.

Model-Agnostic Detection: Statistical Formulation

The detection challenge is to distinguish watermarked text from non-watermarked text without access to either the underlying LLM or its distribution. Let $V_t = U_{t,A_t}$ for each token $A_t$ . Under the null hypothesis (non-watermarked text), $(V_t)_{t=1}^n$ are i.i.d. uniform variables. Under watermarked text, their distribution deviates from uniformity. Detection, therefore, reduces to a statistical test for uniformity in $(V_t)$ , implementable without model access.

Power-Law Statistic: Theory and Bounds

The crux of the contribution is the introduction of a truncated power-law test statistic defined as:

$S(V_t) = \min\left(\frac{1}{\sqrt{\epsilon}}, \frac{1}{\sqrt{1-V_t}}\right) - \mu.$

Here, $\epsilon$ is set as $(U_{t,a})_{a \in \Sigma}$ 0 with $(U_{t,a})_{a \in \Sigma}$ 1 controlling the false positive rate, and $(U_{t,a})_{a \in \Sigma}$ 2 is a normalization constant ensuring $(U_{t,a})_{a \in \Sigma}$ 3 when $(U_{t,a})_{a \in \Sigma}$ 4. Detection is performed by checking if $(U_{t,a})_{a \in \Sigma}$ 5, where $(U_{t,a})_{a \in \Sigma}$ 6 ensures the false positive rate remains at most $(U_{t,a})_{a \in \Sigma}$ 7.

Upper Bound (Detection Guarantee)

The main theorem demonstrates that, when applied to $(U_{t,a})_{a \in \Sigma}$ 8 tokens with next-token distributions $(U_{t,a})_{a \in \Sigma}$ 9, the probability of detection misses (type II error) is upper-bounded whenever

$A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 0

where

$A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 1

is an entropy-like measure aggregating the "randomness" present in the LLM's output. The required number of tokens for reliable detection is thus

$A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 2

Lower Bound (Limitation for All Model-Agnostic Schemes)

A minimax lower bound is also established, applicable to all valid selection kernels (including arbitrary watermarking schemes), showing no model-agnostic detector can outperform the power-law method (up to logarithmic factors). If $A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 3 are drawn i.i.d. from a symmetric law $A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 4, then any detection method with error $A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 5 requires

$A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 6

This establishes near-optimality of the proposed detector in sample-complexity.

Comparisons and Practical Considerations

Comparison to Existing Methods

The analysis situates this method between Gumbel-based exponential detectors—where detection time scales as $A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 7 for mean entropy $A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 8—and heavy-tailed alternatives. The power-law statistic reduces the sample complexity to $A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.$ 9 under suitable conditions, representing a provable improvement. However, experiments indicate that for real-world language distributions (where large-entropy regimes dominate), practical gains may be tempered by constant and logarithmic factors. Combining the new detector with previous methods via a union bound can sometimes yield improved practical detection probabilities.

Theoretical Implications

The theoretical innovation lies in characterizing detection times precisely as a function of the entropy spectrum across tokens. The work clarifies when undetectability is inevitable (e.g., for deterministic, low-entropy outputs) and quantifies detection time under mixtures of high and low entropy. These fine-grained distinctions are absent in analyses based solely on average entropy.

Model (In)dependence

The detector's agnosticism to the LLM is a core strength: it neither requires access to model internals nor any prior on $P_t$ 0 except for reconstructing the pseudorandom seed. The analysis also suggests further statistical power is achievable if side information or even coarse surrogates for $P_t$ 1 are available—opening up the potential for learned hybrid detection mechanisms.

Directions for Future Research

Several directions are evident from the results:

Hybrid Detection: Leveraging low-fidelity models of $P_t$ 2 for adaptive or weighted detection strategies.
Adaptive/Anytime Testing: Extending power-law detection to support variable-length or streaming contexts, following recent advances (Huang et al., 19 Feb 2026).
Robustness to Human Edits: Investigating resilience to adversarial or benign alterations, as in (He et al., 4 Oct 2025).
Optimal Heaviness: Exploring alternative statistics with even heavier tails or constructing fully adaptive test statistics based on empirical entropy observations.
Reduction of Logarithmic Gaps: Closing the remaining log-factor gaps between upper and lower sample complexity bounds.

Conclusion

This work articulates a nearly optimal, model-agnostic detection method for Gumbel watermarking in LLMs, based on a theoretically refined truncation of power-law goodness-of-fit statistics. The provided upper and lower bounds, derived under minimal assumptions, clarify the efficiency limits of non-distortionary watermark detection and highlight the importance of entropy structure in practical watermark detection scenarios. This foundation provides a firm basis for future advancements in both watermarking theory and real-world scalable LLM provenance verification.

Markdown Report Issue