Left-Right Diffusion Watermarking
- Left-Right Diffusion Watermarking (LR-DWM) is a framework for diffusion language models that embeds watermark signals using both left and right neighbor tokens.
- It employs cryptographic hash functions and independent logit biasing to inject order-agnostic watermarks with minimal computational and memory overhead.
- Experimental evaluations demonstrate high detection rates with low perplexity impact and robust performance against typical deletion and substitution attacks.
Left-Right Diffusion Watermarking (LR-DWM) is a watermarking framework designed specifically for Diffusion LLMs (DLMs), enabling robust statistical detection and attribution of AI-generated text with minimal computational and memory overhead. Unlike previous watermarking systems adapted to autoregressive (AR) LLMs, LR-DWM is tailored for the non-sequential, iterative denoising process of DLMs and leverages locally-available token context, notably both left and right neighbors, to inject order-agnostic watermark signals (Raban et al., 18 Jan 2026, &&&1&&&).
1. Motivation and Technical Foundations
Autoregressive watermarking mechanisms rely on the left-to-right, sequential generation characteristic of AR LLMs—for each token position , a hash of the entire preceding context determines a “green set” (a balanced subset of the vocabulary), whose members are promoted via logit biasing during generation. However, DLMs generate text by iteratively refining tokens in a parallel or arbitrary position order, so many neighbors remain undecided at any update step, rendering AR-style hashing inapplicable without extensive architectural changes or process inversion (Raban et al., 18 Jan 2026, Gloaguen et al., 29 Sep 2025).
The key insight of LR-DWM is to use each finalized neighbor—as soon as it becomes available during denoising—as a local anchor for watermark constraints. Each token position applies independent logit biases according to the hash-derived green sets formed by the revealed left () and right () neighbors. These additive constraints allow effective watermarking regardless of generation order and without dependence on central memory or expectation steps.
2. Mathematical Formulation
Let denote the token sequence and the vocabulary. DLMs proceed over denoising steps; at step , a subset of positions are selected for update. For each , the unwatermarked logits define the conditional distribution:
If or is finalized, cryptographic hash functions parameterized by secret keys and generate green sets:
Watermark insertion occurs by adding bias parameters and (typically both set to a strength hyperparameter ) to logits indexed by green-set membership:
The watermarked sampling distribution thus becomes:
where and are indicator functions for set membership.
Statistical detectability is quantified post-generation. Each token yields match indicators and a ternary score:
Aggregating over the sequence and normalizing by empirically calibrated variance from human texts gives the detection statistic:
Under the null hypothesis (unwatermarked/human text), is approximately standard normal, permitting precise control of false positive rates.
3. Algorithmic Workflow and Implementation
LR-DWM is realized with minimal overhead and does not require caching or process inversion. The pseudocode below summarizes the insertion process in DLM decoding (Raban et al., 18 Jan 2026).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Input: prompt c, diffusion model M, secret keys k_L, k_R, bias δ
Output: generated sequence y
Initialize y^{(S)} by corrupting the prompt-conditioned token sequence
for s = S downto 1:
I_s ← positions to update
for i in I_s:
l ← M.logits(y^{(s)}, i)
if y^{(s)}_{i-1} revealed:
G_L ← G(y_{i-1}, k_L)
l[v in G_L] += δ
if y^{(s)}_{i+1} revealed:
G_R ← G(y_{i+1}, k_R)
l[v in G_R] += δ
y_i ∼ Softmax(l) or deterministic decode
y^{(s-1)} ← updated sequence
return y^{(0)} |
4. Detection, Guarantees, and Analytical Properties
Under , neighbor-actuated green sets for each token position are indistinguishable from random assignment; matched indicators thus follow independent Bernoulli distributions, and is calibrated for controlled FPR (e.g., yields FPR). Under watermarking hypothesis , each available neighbor increases match probability by an amount linear in . For small , analytical expressions predict the Z-statistic mean shift and statistical power:
Complexity analysis shows insertion and detection steps are computationally efficient. LR-DWM's complexity is per updated position, with no need for lookup tables or expectation steps required by prior DLM watermarking variants (Raban et al., 18 Jan 2026). Practical watermark detectors apply Z-thresholding on sequences of few hundred tokens.
5. Experimental Evaluation
Experiments conducted on LLaDA-8B-Instruct and DREAM-7B-Instruct using the WaterBench dataset (600 prompts, sequences of length , steps) demonstrate LR-DWM achieves high watermark detection rates with negligible quality impact (Raban et al., 18 Jan 2026). Performance metrics include:
- Wall-clock and memory overhead: LR-DWM nearly identical to baseline; WM-DLM incurs overhead, DMARK doubles GPU memory.
- Perplexity () and Detectability Trade-off:
| Detection Rate | WM-DLM PPL±SEM | DMARK | LR-DWM |
|---|---|---|---|
| 90% | 5.07±1.40 | 2.82±0.51 | 2.80±0.46 |
| 99% | 6.14±1.84 | 3.28±0.61 | 3.32±0.65 |
| 99.5% | 6.33±1.90 | 3.34±0.63 | 3.37±0.66 |
LR-DWM matches DMARK's detection-vs-quality profile and surpasses WM-DLM for low PPL at high detection rates.
- Robustness: LR-DWM achieves 98.8% detection under 10% random deletion, 99.4% under 10% BERT-based substitution, but only 15.8% under paraphrasing, a limitation common to local-context watermarks.
Additional trials with sequence lengths show detection rates at FPR, with log-perplexity increases under $0.05$ and expert judgment showing unchanged style, accuracy, and ethics (Gloaguen et al., 29 Sep 2025).
6. Limitations, Context, and Future Directions
LR-DWM exhibits a fundamental trade-off between watermark strength () and fluency as measured by perplexity. Very large values may be unsuitable for sensitive applications that demand high output quality. Reliable statistical detection is achieved on text strings of a few hundred tokens or more.
Robustness to paraphrase remains an open challenge, as local context constraints are disrupted by extensive rephrasing. The method is resilient to typical deletion and substitution attacks, but not global rewrites. Ongoing research seeks to integrate wider context (e.g., multi-token anchors) and adaptive strength parameters based on position-level confidence. Extensions to more syntax-aware or global watermark designs are proposed.
LR-DWM constitutes an efficient, two-sided watermarking method for DLMs by exploiting whichever neighbor tokens are available during generation. The scheme achieves competitive detection-vs-quality trade-offs and preserves both computational and memory efficiency relative to non-watermarked baselines, marking a substantial advance in the domain of diffusion-model watermarking (Raban et al., 18 Jan 2026, Gloaguen et al., 29 Sep 2025).