Unbiased Watermarking Methods

Updated 5 October 2025

Unbiased watermarking methods are techniques that embed invisible signals into AI-generated content while preserving the original output distribution in expectation.
They utilize frameworks such as logit reweighting, multi-channel partitioning, sampling-based acceptance, and ensemble strategies to ensure imperceptibility and robustness.
These methods maintain high fidelity and compliance across text, image, and numerical applications, supporting reliable verification and regulatory alignment.

Unbiased watermarking methods are designed to embed detectable signals into data generated by models (notably LLMs and generative image models) such that the underlying output distribution is preserved exactly in expectation. Unlike biased watermarking, which often alters statistical or qualitative properties, unbiased watermarking is formulated to be completely invisible—an observer without the secret key cannot distinguish watermarked from non-watermarked content by analyzing distributional statistics. This property is crucial for ensuring imperceptibility, maintaining model performance, satisfying regulatory constraints, and facilitating scalable, high-quality provenance verification.

1. Principles of Unbiased Watermarking

Unbiased watermarking enforces the constraint that, for any context and watermark key $k$ , the average (over all watermark key choices or random seeds) of the watermarked probability distribution equals that of the original model:

$\mathbb{E}_k[P_W(\cdot | x, k)] = P_M(\cdot | x)$

This principle ensures that, in expectation, inserting the watermark does not shift the likelihoods of any output, thereby maintaining output quality and semantic fidelity. Unbiased methods are further characterized by “distribution-preservation”—detectable signals are only statistically visible to an authorized verifier via hypothesis testing or aggregating subtle correlations over collections of outputs.

Recent advances formalize additional desiderata:

n-shot undetectability: For any number of generations, the joint distribution remains that of the original model in expectation (Hu et al., 2023).
Downstream invariance: Evaluation metrics (e.g., perplexity, BLEU, ROUGE) are preserved in mean across watermarked and non-watermarked generations (Wu et al., 28 Sep 2025).

2. Methodological Frameworks

Unbiased watermarking has been instantiated through several distinct methodologies:

Logit-based reweighting and partitioning: Methods such as $\delta$ -reweight and $\gamma$ -reweight construct reweighting functions that modify logits post-hoc but guarantee unbiasedness via formal expectation constraints (Hu et al., 2023, Chen et al., 16 Feb 2025). For instance, the $\gamma$ -reweight method permutes vocabulary tokens, amplifying or compressing probabilities within segments so that over all keys, the original distribution is preserved.
Multi-channel partitioning: MCmark partitions the vocabulary into $l$ segments and, per watermark key, promotes sampling from a selected segment. Optimization guarantees are imposed:

$\sum_{i=1}^l P_i,V_j = l P_{V_j} \quad \forall j$

Each channel $P_i$ boosts its assigned segment to maximize detectability while preserving unbiasedness in aggregate (Chen et al., 16 Feb 2025).

Sampling-based acceptance: STA-1 exemplifies token-level unbiased watermarking by first sampling a candidate token, then accepting/rejecting based on its presence in a randomly assigned “green list”, with second-chance sampling for rejected candidates. This leads to the property:

$\mathbb{E}_k[ P_{W}(x_t | x_{1:t-1}; k) ] = P_M(x_t | x_{1:t-1})$

The method is particularly suited for low-entropy scenarios, reducing risk of unsatisfactory outputs (Mao et al., 23 May 2024).

Ensemble frameworks: ENS stacks multiple independent unbiased watermarking transforms, each governed by a distinct key, to amplify the aggregate signal for statistical detection without sacrificing unbiasedness:

$\mathbb{E}_{k_{1:n}}[\,_n(P_M(\cdot|x_{1:t}))\,] = P_M(\cdot|x_{1:t})$

This construction improves signal-to-noise ratio for detection and enhances robustness against perturbations (Wu et al., 28 Sep 2025).

Bit-flip reweighting architectures: BiMark leverages symmetric binary coin-flips to partition and reweight vocabulary tokens multilayer-wise, supporting multi-bit message embedding while keeping output probabilities unbiased (Feng et al., 19 Jun 2025).
Personalized hash-based watermarking: PersonaMark embeds watermarks at the sentence-structure level using cryptographic hash functions, ensuring uniform bias and scalability for multi-user attribution (Zhang et al., 15 Sep 2024).
Token-level and codebook-based watermarking for autoregressive models: IndexMark exploits codebook redundancy in VQ-VAE autoregressive image models, replacing indices with similar pairs and relying on statistical thresholds for detection while maintaining image fidelity (Tong et al., 20 May 2025).

3. Theoretical Guarantees and Analysis

Unbiased watermarking schemes are rigorously analyzed for:

Proof of unbiasedness: Most frameworks provide formal proofs or induction arguments guaranteeing that expected output distributions are unchanged (see, e.g., ENS (Wu et al., 28 Sep 2025), STA-1 (Mao et al., 23 May 2024), MCmark (Chen et al., 16 Feb 2025)).
Limits of unbiasedness: It is proved that no watermark can perfectly preserve output distribution under infinite repeated queries with a fixed key—distribution drift accumulates over batch generations, which becomes statistically detectable (Wu et al., 28 Sep 2025). The “single-prompt multiple-generation (SPMG)” metric quantifies multi-batch drift:

$\Delta Met(P_M, P_W) = \frac{1}{n} \sum_{i=1}^n | \overline{Met}_i(P_M) - \overline{Met}_i(P_W) |$

where $\overline{Met}_i$ denotes the mean of metric $Met$ over $m$ generations per prompt $i$ .

Detection and statistical tests: Detection typically uses z-scores, binomial or normal approximations, and hypothesis testing. For example, in MCmark, counting tokens matching the promoted segment yields a score following binomial null distribution, enabling low false-positive hypothesis tests (Chen et al., 16 Feb 2025).
Certified robustness: Theoretical robustness analyses bound the impact of token-level modification attacks. For any $b$ -edit attack, if the detector statistic exceeds threshold plus worst-case token effect, the watermark remains reliably detectable:

$S(x) - \tau > b \cdot R_{\max} \cdot B \implies S(x') \geq \tau$

for any $x'$ under $b$ modifications (Wu et al., 28 Sep 2025).

4. Application Domains: Text, Images, and Numerical Data

Unbiased watermarking spans a range of generative domains:

Text/LLMs: Unbiased watermarks are integrated into LLM token generation, preserving semantic metrics (perplexity, BLEU, ROUGE, BERTScore), maintaining naturalness for summarization and translation, and resisting detection or removal by attackers without keys (Hu et al., 2023, Mao et al., 23 May 2024, Zhang et al., 15 Sep 2024, Feng et al., 19 Jun 2025, Wu et al., 28 Sep 2025).
Images: For generative image models:
- Diffusion models utilize initial Gaussian noise as an invisible watermark (distortion-free per image) and Fourier-based grouping for detection (Arabi et al., 5 Dec 2024).
- Super-resolution models employ deterministic DDIM inversion for robust bit extraction under transformations and adaptive attacks (Hu et al., 13 Dec 2024).
- VQ-VAE-led autoregressive models allow training-free codebook index watermarking via pairwise replacement (Tong et al., 20 May 2025).
- These approaches consistently report near-perfect extraction accuracy (99.46% under standard distortions with SuperMark), high fidelity (PSNR, SSIM), and robustness against cropping, blurring, and compression.
Numerical data: Homomorphic encryption frameworks watermark encrypted numerical datasets, designing watermark sequences whose statistical aggregate (mean, variance) remains invariant; security is enforced through data obfuscation and brute-force infeasibility ( $O(n!/(r!(n-r)!))$ ) for unauthorized detection (Alqarni, 2023).

5. Performance Evaluation and Benchmarks

Standardized benchmarks such as UWbench have been established to systematically compare unbiased watermarking methods:

Unbiasedness: Evaluated via metrics such as SPMG gap, calibrated detection statistic ( $DetWmk$ ), and downstream task metrics; distributions are compared over single and multiple samples (Wu et al., 28 Sep 2025).
Detectability: True positive rates at fixed false positive rates, median p-values, and AUROC are measured (often over large-scale datasets and multiple LLMs).
Robustness: Attacks span paraphrasing, back-translation, and token-level modification, with certified bounds and empirical resilience reported.

Select empirical comparisons from recent literature:

Method	Unbiasedness Guarantee	Detectability TPR (FPR=1%)	Robustness Metric
$\gamma$ -reweight (Hu et al., 2023)	Yes	Moderate	Vulnerable for very short texts
MCmark (Chen et al., 16 Feb 2025)	Yes	High (+10% over prior SOTA)	Maintains TPR under token attacks
ENS (Wu et al., 28 Sep 2025)	Yes (ensemble stacking)	Superior for short texts	Robust to smoothing/paraphrasing
BiMark (Feng et al., 19 Jun 2025)	Yes (multi-layer, multi-bit)	30% better for short texts	Extraction rate robust vs. MPAC
PersonaMark (Zhang et al., 15 Sep 2024)	Yes (per-user hash, sentence-level)	High	Scaling: detection time linear
STA-1 (Mao et al., 23 May 2024)	Yes	High in low-entropy tasks	Robust against local token edits
IndexMark (Tong et al., 20 May 2025)	Yes (pairwise codebook)	Near-perfect image accuracy	High against cropping, noise
SuperMark (Hu et al., 13 Dec 2024)	Yes (diffusion SR reversibility)	~99.5% under distortions	≥89% under adaptive attacks
WIND (Arabi et al., 5 Dec 2024)	Yes (distortion-free, two-stage)	High (>0.9 avg accuracy)	Resilient to forgery, removal

6. Limitations, Trade-offs, and Open Theoretical Questions

The field recognizes important limitations and trade-offs:

Detection vs. robustness vs. capacity: Increasing the statistical signal (e.g., via more partitions, deeper ensembles, or more bits) often reduces practical robustness to adversarial modification and can require longer sample sequences for reliable detection.
Accumulation of bias under repeated queries: Even unbiased schemes eventually drift statistically under infinite repeated fixed-key generations, violating perfect unbiasedness in the multi-batch regime (Wu et al., 28 Sep 2025).
Certified robustness favors token-level modification over paraphrasing: Empirical evidence and formal bounds demonstrate that robustness assessed via paraphrasing is highly variable and less reproducible than analysis via controlled token edit attacks.
Scalability: Algorithms such as PersonaMark address multi-user scalability with efficient hash-driven partitioning and detection.
Efficiency: For large candidate sets (e.g., $N=10^5$ in image watermarking), detection runtime approaches (WIND, SuperMark) use grouping, hierarchical search, and efficient indexing to enable tractable verification.

7. Practical Implications and Regulatory Alignment

Unbiased watermarking methods are increasingly relevant for compliance with regulatory initiatives such as the EU AI Act and US guidelines on synthetic content provenance. The model-agnostic detection property (relying on token counting or output statistics rather than internal model logits) facilitates independent verification and aligns with practical deployment requirements—fidelity preservation, widespread composability, and privacy-preserving multi-user tracking.

Watermarking frameworks now support embedding multi-bit messages, time stamps, user identifiers, and model identity, all without compromising downstream utility. Robust benchmarks and open-source toolkits (UWbench) are available for standardized comparison and principled evaluation.

In summary, unbiased watermarking constitutes a mathematically and statistically rigorous paradigm for content attribution and provenance verification in AI-generated data. It balances invisibility, detectability, robustness, and fidelity, offering theoretical guarantees and empirical performance validated by recent research and benchmark efforts. While open problems related to infinite-query bias and practical robustness remain, the corpus of current work provides foundational techniques for trustworthy and regulation-ready generative AI systems.