SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark

Published 31 Mar 2026 in cs.CV and cs.CR | (2603.29742v2)

Abstract: Diffusion-based watermarking methods embed verifiable marks by manipulating the initial noise or the reverse diffusion trajectory. However, these methods share a critical assumption: verification can succeed only if the diffusion trajectory can be faithfully reconstructed. This reliance on trajectory recovery constitutes a fundamental and exploitable vulnerability. We propose $\underline{\mathbf{S}}$tochastic $\underline{\mathbf{Hi}}$dden-Trajectory De$\underline{\mathbf{f}}$lec$\underline{\mathbf{t}}$ion ($\mathbf{SHIFT}$), a training-free attack that exploits this common weakness across diverse watermarking paradigms. SHIFT leverages stochastic diffusion resampling to deflect the generative trajectory in latent space, making the reconstructed image statistically decoupled from the original watermark-embedded trajectory while preserving strong visual quality and semantic consistency. Extensive experiments on nine representative watermarking methods spanning noise-space, frequency-domain, and optimization-based paradigms show that SHIFT achieves 95%--100% attack success rates with nearly no loss in semantic quality, without requiring any watermark-specific knowledge or model retraining.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper presents the SHIFT attack, which disrupts deterministic inversion using partial forward diffusion and stochastic reverse sampling to remove watermarks.
It achieves a 95–100% attack success rate across nine watermark schemes while maintaining high semantic fidelity and low FID scores.
The study exposes vulnerabilities in diffusion watermarking and calls for trajectory-agnostic verification methods for robust AI content provenance.

SHIFT: Stochastic Hidden-Trajectory Deflection for Watermark Removal in Diffusion Models

Overview and Motivation

The proliferation of high-fidelity diffusion-based image synthesis has precipitated urgent demands for reliable AI-generated content provenance, with diffusion watermarking emerging as a principal paradigm for robust embedding and verification. Conventional watermark-removal attacks—regeneration, pixel-space perturbation, and latent-space optimization—either fail to remove deep-coupled semantic marks or incur prohibitive computational cost. This paper introduces "SHIFT: Stochastic Hidden-Trajectory Deflection," a training-free watermark removal attack premised on breaking the trajectory consistency assumption universally underpinning diffusion watermark verifiers.

Methodology: Trajectory Deflection via Stochastic Sampling

SHIFT operationalizes watermark removal by exploiting stochasticity in reverse diffusion. The core insight is that watermark verification fundamentally relies on deterministic reversibility, wherein the trajectory between the initial noise and the final image remains reconstructable via inversion (typically DDIM). SHIFT disrupts this dependency in two stages:

Partial Forward Diffusion: The watermarked image is encoded into the latent space and subjected to controlled re-noising. The parameter $\lambda$ defines the re-noising strength, determining the depth of forward diffusion, which attenuates trajectory-specific information while preserving the semantic scaffold.
Stochastic Reverse Resampling: From the partially-noised latent, reverse diffusion is performed using an ancestral stochastic sampler (injecting Brownian motion). Unlike deterministic DDIM, stochastic sampling decouples the reconstructed image from the original watermark-embedded trajectory, as each reverse sampling run traces a distinct path in latent space.
Figure 1: The overall framework of SHIFT, incorporating partial forward diffusion and stochastic reverse resampling to deflect watermark-carrying trajectories.

No watermark knowledge, retraining, or adversarial optimization is needed. SHIFT simply uses any publicly available latent diffusion model.

Theoretical Guarantees and Analysis

A rigorous Wasserstein distance analysis corroborates trajectory decoupling. Formally, the recovered noise from the attacked sample is shown to be approximately independent from the original watermark-carrying noise. The bound, governed by cumulative Lipschitz constants and signal retention coefficients, quantifies the decoupling as a function of attack strength. Theoretical results demonstrate that sufficiently deep forward diffusion paired with stochastic reverse resampling achieves statistical independence of recovered noise and watermark, yielding failure of verification regardless of embedding paradigm.

Crucially, deterministic resampling (e.g., DDIM) preserves residual watermark structure due to the many-to-one mapping problem in latent space; stochasticity is indispensable for complete trajectory deflection.

Empirical Evaluation

Experimental assessment spans nine watermarking schemes—noise-space (Tree-Ring, RingID, PRC, WIND), frequency-domain (Gaussian Shading, GaussMarker, SFW), and optimization-driven (ROBIN, SEAL)—with quantitative and qualitative evaluation on semantic fidelity (CLIP score), distributional quality (FID), and attack success rate (ASR).

Attack Success Rate: SHIFT achieves 95–100% ASR across all methods, outperforming black-box and latent-noise removing attacks, with average ASR of 97.8%.
Semantic and Distributional Quality: Attacked images retain high semantic consistency and lowest FID among compared attacks, owing to guidance by the pretrained score function and stochastic trajectory generation.
Figure 2: Comparison of $L_1$ and $L_2$ noise distances across nine watermarking methods, evidencing substantial trajectory decoupling via SHIFT.

Trajectory decoupling is empirically verified through analysis of noise distances between the DDIM-inverted recovered noise and the embedded watermark noise. SHIFT not only achieves maximal displacement from the watermark trajectory but does so consistently across methods, as detailed in Figure 2.

Figure 3: Mean $L_1$ and $L_2$ noise distances as functions of attack strength $\lambda$ across nine watermarking methods, demonstrating monotonic progression toward random baseline noise as trajectory deflection increases.

Noise distance curves as a function of $\lambda$ show convergence to a random Gaussian regime at $\lambda \rightarrow 1$ , irrespective of watermark structure, confirming the theoretical predictions.

Implications and Future Perspectives

SHIFT exposes a fundamental vulnerability in diffusion watermarking predicated on trajectory-preserving verification. The practical implication is that watermarking schemes relying on deterministic invertibility, even those with empirically robust embedding, are universally susceptible to stochastic trajectory-deflection attacks. Theoretically, this result provokes a reconsideration of watermark embedding: future provenance mechanisms must devise verification strategies resilient to stochastic sampling and non-invertible generative dynamics.

Potential future directions include:

Trajectory-Agnostic Provenance: Designing watermark verification protocols robust to stochastic latent trajectory modifications.
Extension to Video Diffusion: Addressing temporal coherence and cross-frame constraints for robust watermarking/attack strategies in video.
Hybrid Generative Models: Investigating whether analogous vulnerabilities manifest in scale-wise autoregressive or flow-matching models, which also entail trajectory dependencies.

Conclusion

By systematically disrupting trajectory-level dependencies through stochastic reverse sampling, SHIFT universally removes diffusion-based watermarks with minimal semantic degradation and superior image quality. The attack operates efficiently and model-agnostically, achieving empirical and theoretical decoupling from watermark-carrying trajectories across diverse watermarking paradigms. The findings necessitate a broader security paradigm for provenance in generative AI, beyond the current reliance on trajectory consistency and deterministic inversion.

[SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark" (2603.29742)]

Markdown Report Issue