Published 31 Mar 2026 in cs.LG, cs.CR, and stat.ML | (2603.30017v1)
Abstract: We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.
The paper introduces a novel statistical detection method using a truncated power-law test to identify Gumbel watermarks without accessing the underlying LLM.
The paper establishes upper and lower bounds on sample complexity based on entropy conditions, ensuring near-optimal performance under varying token distributions.
The paper demonstrates improved efficiency over traditional exponential detectors, reducing detection time in high-entropy regimes while preserving non-distortionary properties.
Refined Detection for Gumbel Watermarking
Overview
This work provides a new statistical detection mechanism for the Gumbel watermarking scheme in LLMs, enhancing the efficiency of watermark detection while preserving the non-distortionary property. The proposed detection method exhibits near-optimal sample complexity in a problem-dependent sense, as established by new upper and lower bounds under precise entropy-based conditions on the sequence of next-token distributions.
Gumbel Watermarking Framework
The Gumbel watermarking approach operates by leveraging cryptographically seeded pseudorandomness to insert watermarks without altering the marginal distribution of the generated text. For each token At, a pseudorandom uniform (Ut,a)a∈Σ is generated using a secret key and a hash over preceding tokens; the token is then sampled as
At=a∈Σargmax−logUt,aPt(a).
This method ensures the output distribution remains Pt while enabling later verification by reconstructing the pseudorandom variables using the secret key.
Model-Agnostic Detection: Statistical Formulation
The detection challenge is to distinguish watermarked text from non-watermarked text without access to either the underlying LLM or its distribution. Let Vt=Ut,At for each token At. Under the null hypothesis (non-watermarked text), (Vt)t=1n are i.i.d. uniform variables. Under watermarked text, their distribution deviates from uniformity. Detection, therefore, reduces to a statistical test for uniformity in (Vt), implementable without model access.
Power-Law Statistic: Theory and Bounds
The crux of the contribution is the introduction of a truncated power-law test statistic defined as:
S(Vt)=min(ϵ1,1−Vt1)−μ.
Here, ϵ is set as (Ut,a)a∈Σ0 with (Ut,a)a∈Σ1 controlling the false positive rate, and (Ut,a)a∈Σ2 is a normalization constant ensuring (Ut,a)a∈Σ3 when (Ut,a)a∈Σ4. Detection is performed by checking if (Ut,a)a∈Σ5, where (Ut,a)a∈Σ6 ensures the false positive rate remains at most (Ut,a)a∈Σ7.
Upper Bound (Detection Guarantee)
The main theorem demonstrates that, when applied to (Ut,a)a∈Σ8 tokens with next-token distributions (Ut,a)a∈Σ9, the probability of detection misses (type II error) is upper-bounded whenever
At=a∈Σargmax−logUt,aPt(a).0
where
At=a∈Σargmax−logUt,aPt(a).1
is an entropy-like measure aggregating the "randomness" present in the LLM's output. The required number of tokens for reliable detection is thus
At=a∈Σargmax−logUt,aPt(a).2
Lower Bound (Limitation for All Model-Agnostic Schemes)
A minimax lower bound is also established, applicable to all valid selection kernels (including arbitrary watermarking schemes), showing no model-agnostic detector can outperform the power-law method (up to logarithmic factors). If At=a∈Σargmax−logUt,aPt(a).3 are drawn i.i.d. from a symmetric law At=a∈Σargmax−logUt,aPt(a).4, then any detection method with error At=a∈Σargmax−logUt,aPt(a).5 requires
At=a∈Σargmax−logUt,aPt(a).6
This establishes near-optimality of the proposed detector in sample-complexity.
Comparisons and Practical Considerations
Comparison to Existing Methods
The analysis situates this method between Gumbel-based exponential detectors—where detection time scales as At=a∈Σargmax−logUt,aPt(a).7 for mean entropy At=a∈Σargmax−logUt,aPt(a).8—and heavy-tailed alternatives. The power-law statistic reduces the sample complexity to At=a∈Σargmax−logUt,aPt(a).9 under suitable conditions, representing a provable improvement. However, experiments indicate that for real-world language distributions (where large-entropy regimes dominate), practical gains may be tempered by constant and logarithmic factors. Combining the new detector with previous methods via a union bound can sometimes yield improved practical detection probabilities.
Theoretical Implications
The theoretical innovation lies in characterizing detection times precisely as a function of the entropy spectrum across tokens. The work clarifies when undetectability is inevitable (e.g., for deterministic, low-entropy outputs) and quantifies detection time under mixtures of high and low entropy. These fine-grained distinctions are absent in analyses based solely on average entropy.
Model (In)dependence
The detector's agnosticism to the LLM is a core strength: it neither requires access to model internals nor any prior on Pt0 except for reconstructing the pseudorandom seed. The analysis also suggests further statistical power is achievable if side information or even coarse surrogates for Pt1 are available—opening up the potential for learned hybrid detection mechanisms.
Directions for Future Research
Several directions are evident from the results:
Hybrid Detection: Leveraging low-fidelity models of Pt2 for adaptive or weighted detection strategies.
Adaptive/Anytime Testing: Extending power-law detection to support variable-length or streaming contexts, following recent advances (Huang et al., 19 Feb 2026).
Robustness to Human Edits: Investigating resilience to adversarial or benign alterations, as in (He et al., 4 Oct 2025).
Optimal Heaviness: Exploring alternative statistics with even heavier tails or constructing fully adaptive test statistics based on empirical entropy observations.
Reduction of Logarithmic Gaps: Closing the remaining log-factor gaps between upper and lower sample complexity bounds.
Conclusion
This work articulates a nearly optimal, model-agnostic detection method for Gumbel watermarking in LLMs, based on a theoretically refined truncation of power-law goodness-of-fit statistics. The provided upper and lower bounds, derived under minimal assumptions, clarify the efficiency limits of non-distortionary watermark detection and highlight the importance of entropy structure in practical watermark detection scenarios. This foundation provides a firm basis for future advancements in both watermarking theory and real-world scalable LLM provenance verification.
“Emergent Mind helps me see which AI papers have caught fire online.”
Philip
Creator, AI Explained on YouTube
Sign up for free to explore the frontiers of research
Discover trending papers, chat with arXiv, and track the latest research shaping the future of science and technology.Discover trending papers, chat with arXiv, and more.