- The paper presents the SHIFT attack, which disrupts deterministic inversion using partial forward diffusion and stochastic reverse sampling to remove watermarks.
- It achieves a 95–100% attack success rate across nine watermark schemes while maintaining high semantic fidelity and low FID scores.
- The study exposes vulnerabilities in diffusion watermarking and calls for trajectory-agnostic verification methods for robust AI content provenance.
SHIFT: Stochastic Hidden-Trajectory Deflection for Watermark Removal in Diffusion Models
Overview and Motivation
The proliferation of high-fidelity diffusion-based image synthesis has precipitated urgent demands for reliable AI-generated content provenance, with diffusion watermarking emerging as a principal paradigm for robust embedding and verification. Conventional watermark-removal attacks—regeneration, pixel-space perturbation, and latent-space optimization—either fail to remove deep-coupled semantic marks or incur prohibitive computational cost. This paper introduces "SHIFT: Stochastic Hidden-Trajectory Deflection," a training-free watermark removal attack premised on breaking the trajectory consistency assumption universally underpinning diffusion watermark verifiers.
Methodology: Trajectory Deflection via Stochastic Sampling
SHIFT operationalizes watermark removal by exploiting stochasticity in reverse diffusion. The core insight is that watermark verification fundamentally relies on deterministic reversibility, wherein the trajectory between the initial noise and the final image remains reconstructable via inversion (typically DDIM). SHIFT disrupts this dependency in two stages:
- Partial Forward Diffusion: The watermarked image is encoded into the latent space and subjected to controlled re-noising. The parameter λ defines the re-noising strength, determining the depth of forward diffusion, which attenuates trajectory-specific information while preserving the semantic scaffold.
- Stochastic Reverse Resampling: From the partially-noised latent, reverse diffusion is performed using an ancestral stochastic sampler (injecting Brownian motion). Unlike deterministic DDIM, stochastic sampling decouples the reconstructed image from the original watermark-embedded trajectory, as each reverse sampling run traces a distinct path in latent space.
Figure 1: The overall framework of SHIFT, incorporating partial forward diffusion and stochastic reverse resampling to deflect watermark-carrying trajectories.
No watermark knowledge, retraining, or adversarial optimization is needed. SHIFT simply uses any publicly available latent diffusion model.
Theoretical Guarantees and Analysis
A rigorous Wasserstein distance analysis corroborates trajectory decoupling. Formally, the recovered noise from the attacked sample is shown to be approximately independent from the original watermark-carrying noise. The bound, governed by cumulative Lipschitz constants and signal retention coefficients, quantifies the decoupling as a function of attack strength. Theoretical results demonstrate that sufficiently deep forward diffusion paired with stochastic reverse resampling achieves statistical independence of recovered noise and watermark, yielding failure of verification regardless of embedding paradigm.
Crucially, deterministic resampling (e.g., DDIM) preserves residual watermark structure due to the many-to-one mapping problem in latent space; stochasticity is indispensable for complete trajectory deflection.
Empirical Evaluation
Experimental assessment spans nine watermarking schemes—noise-space (Tree-Ring, RingID, PRC, WIND), frequency-domain (Gaussian Shading, GaussMarker, SFW), and optimization-driven (ROBIN, SEAL)—with quantitative and qualitative evaluation on semantic fidelity (CLIP score), distributional quality (FID), and attack success rate (ASR).
Trajectory decoupling is empirically verified through analysis of noise distances between the DDIM-inverted recovered noise and the embedded watermark noise. SHIFT not only achieves maximal displacement from the watermark trajectory but does so consistently across methods, as detailed in Figure 2.
Figure 3: Mean L1 and L2 noise distances as functions of attack strength λ across nine watermarking methods, demonstrating monotonic progression toward random baseline noise as trajectory deflection increases.
Noise distance curves as a function of λ show convergence to a random Gaussian regime at λ→1, irrespective of watermark structure, confirming the theoretical predictions.
Implications and Future Perspectives
SHIFT exposes a fundamental vulnerability in diffusion watermarking predicated on trajectory-preserving verification. The practical implication is that watermarking schemes relying on deterministic invertibility, even those with empirically robust embedding, are universally susceptible to stochastic trajectory-deflection attacks. Theoretically, this result provokes a reconsideration of watermark embedding: future provenance mechanisms must devise verification strategies resilient to stochastic sampling and non-invertible generative dynamics.
Potential future directions include:
- Trajectory-Agnostic Provenance: Designing watermark verification protocols robust to stochastic latent trajectory modifications.
- Extension to Video Diffusion: Addressing temporal coherence and cross-frame constraints for robust watermarking/attack strategies in video.
- Hybrid Generative Models: Investigating whether analogous vulnerabilities manifest in scale-wise autoregressive or flow-matching models, which also entail trajectory dependencies.
Conclusion
By systematically disrupting trajectory-level dependencies through stochastic reverse sampling, SHIFT universally removes diffusion-based watermarks with minimal semantic degradation and superior image quality. The attack operates efficiently and model-agnostically, achieving empirical and theoretical decoupling from watermark-carrying trajectories across diverse watermarking paradigms. The findings necessitate a broader security paradigm for provenance in generative AI, beyond the current reliance on trajectory consistency and deterministic inversion.
[SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark" (2603.29742)]