Papers
Topics
Authors
Recent
Search
2000 character limit reached

Refined Detection for Gumbel Watermarking

Published 31 Mar 2026 in cs.LG, cs.CR, and stat.ML | (2603.30017v1)

Abstract: We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.

Authors (1)

Summary

  • The paper introduces a novel statistical detection method using a truncated power-law test to identify Gumbel watermarks without accessing the underlying LLM.
  • The paper establishes upper and lower bounds on sample complexity based on entropy conditions, ensuring near-optimal performance under varying token distributions.
  • The paper demonstrates improved efficiency over traditional exponential detectors, reducing detection time in high-entropy regimes while preserving non-distortionary properties.

Refined Detection for Gumbel Watermarking

Overview

This work provides a new statistical detection mechanism for the Gumbel watermarking scheme in LLMs, enhancing the efficiency of watermark detection while preserving the non-distortionary property. The proposed detection method exhibits near-optimal sample complexity in a problem-dependent sense, as established by new upper and lower bounds under precise entropy-based conditions on the sequence of next-token distributions.

Gumbel Watermarking Framework

The Gumbel watermarking approach operates by leveraging cryptographically seeded pseudorandomness to insert watermarks without altering the marginal distribution of the generated text. For each token AtA_t, a pseudorandom uniform (Ut,a)aΣ(U_{t,a})_{a \in \Sigma} is generated using a secret key and a hash over preceding tokens; the token is then sampled as

At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.

This method ensures the output distribution remains PtP_t while enabling later verification by reconstructing the pseudorandom variables using the secret key.

Model-Agnostic Detection: Statistical Formulation

The detection challenge is to distinguish watermarked text from non-watermarked text without access to either the underlying LLM or its distribution. Let Vt=Ut,AtV_t = U_{t,A_t} for each token AtA_t. Under the null hypothesis (non-watermarked text), (Vt)t=1n(V_t)_{t=1}^n are i.i.d. uniform variables. Under watermarked text, their distribution deviates from uniformity. Detection, therefore, reduces to a statistical test for uniformity in (Vt)(V_t), implementable without model access.

Power-Law Statistic: Theory and Bounds

The crux of the contribution is the introduction of a truncated power-law test statistic defined as:

S(Vt)=min(1ϵ,11Vt)μ.S(V_t) = \min\left(\frac{1}{\sqrt{\epsilon}}, \frac{1}{\sqrt{1-V_t}}\right) - \mu.

Here, ϵ\epsilon is set as (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}0 with (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}1 controlling the false positive rate, and (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}2 is a normalization constant ensuring (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}3 when (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}4. Detection is performed by checking if (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}5, where (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}6 ensures the false positive rate remains at most (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}7.

Upper Bound (Detection Guarantee)

The main theorem demonstrates that, when applied to (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}8 tokens with next-token distributions (Ut,a)aΣ(U_{t,a})_{a \in \Sigma}9, the probability of detection misses (type II error) is upper-bounded whenever

At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.0

where

At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.1

is an entropy-like measure aggregating the "randomness" present in the LLM's output. The required number of tokens for reliable detection is thus

At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.2

Lower Bound (Limitation for All Model-Agnostic Schemes)

A minimax lower bound is also established, applicable to all valid selection kernels (including arbitrary watermarking schemes), showing no model-agnostic detector can outperform the power-law method (up to logarithmic factors). If At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.3 are drawn i.i.d. from a symmetric law At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.4, then any detection method with error At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.5 requires

At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.6

This establishes near-optimality of the proposed detector in sample-complexity.

Comparisons and Practical Considerations

Comparison to Existing Methods

The analysis situates this method between Gumbel-based exponential detectors—where detection time scales as At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.7 for mean entropy At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.8—and heavy-tailed alternatives. The power-law statistic reduces the sample complexity to At=argmaxaΣPt(a)logUt,a.A_t = \underset{a \in \Sigma}{\arg\max} \frac{P_t(a)}{-\log U_{t,a}}.9 under suitable conditions, representing a provable improvement. However, experiments indicate that for real-world language distributions (where large-entropy regimes dominate), practical gains may be tempered by constant and logarithmic factors. Combining the new detector with previous methods via a union bound can sometimes yield improved practical detection probabilities.

Theoretical Implications

The theoretical innovation lies in characterizing detection times precisely as a function of the entropy spectrum across tokens. The work clarifies when undetectability is inevitable (e.g., for deterministic, low-entropy outputs) and quantifies detection time under mixtures of high and low entropy. These fine-grained distinctions are absent in analyses based solely on average entropy.

Model (In)dependence

The detector's agnosticism to the LLM is a core strength: it neither requires access to model internals nor any prior on PtP_t0 except for reconstructing the pseudorandom seed. The analysis also suggests further statistical power is achievable if side information or even coarse surrogates for PtP_t1 are available—opening up the potential for learned hybrid detection mechanisms.

Directions for Future Research

Several directions are evident from the results:

  • Hybrid Detection: Leveraging low-fidelity models of PtP_t2 for adaptive or weighted detection strategies.
  • Adaptive/Anytime Testing: Extending power-law detection to support variable-length or streaming contexts, following recent advances (Huang et al., 19 Feb 2026).
  • Robustness to Human Edits: Investigating resilience to adversarial or benign alterations, as in (He et al., 4 Oct 2025).
  • Optimal Heaviness: Exploring alternative statistics with even heavier tails or constructing fully adaptive test statistics based on empirical entropy observations.
  • Reduction of Logarithmic Gaps: Closing the remaining log-factor gaps between upper and lower sample complexity bounds.

Conclusion

This work articulates a nearly optimal, model-agnostic detection method for Gumbel watermarking in LLMs, based on a theoretically refined truncation of power-law goodness-of-fit statistics. The provided upper and lower bounds, derived under minimal assumptions, clarify the efficiency limits of non-distortionary watermark detection and highlight the importance of entropy structure in practical watermark detection scenarios. This foundation provides a firm basis for future advancements in both watermarking theory and real-world scalable LLM provenance verification.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 3 likes about this paper.