- The paper demonstrates that robust watermarking amplifies residual identity leakage, undermining authentication fidelity.
- It employs a residual information loss objective to minimize watermark leakage while preserving robustness and clean bit accuracy.
- Extensive empirical and certified evaluations confirm that integrating information bottlenecks effectively balances watermark strength with security.
Identity Leakage in Robust Post-Processing Image Watermarking: Analysis and Mitigation
Overview and Main Contributions
The paper "Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarking" (2605.09646) rigorously analyzes post-processing image watermarking and exposes the overlooked vulnerability of identity leakage, particularly exacerbated by robust watermarking protocols. The authors introduce W-IR, a watermarking framework addressing both certified robustness and mitigation of identity leakage via a residual information minimization objective. The paper substantiates these claims with formal theoretical analysis and comprehensive empirical evaluation across multiple datasets and watermarking strategies.
Figure 1: Main contributions: (1) the discovery of identity leakage and corresponding attacks in post-processing watermarking, especially worsened in the robust case; (2) W-IRโrobust watermarking with mitigated identity leakage.
Identity Leakage: Threat Model and Attack Taxonomy
The core vulnerability arises from the residual images produced by subtracting the original image from the watermarked image. These residuals retain stable, secret watermark patterns, enabling practical attacks:
- Identity Linking: Residual images cluster by user identity, exposing confidential watermark information even without explicit decoding. Clustering analysis demonstrates significantly reduced intra-cluster distances for robust models, indicating stronger identity correlation.

Figure 2: Residual images from four users cluster coherently, revealing identity information under both COCO and CelebA datasets.


Figure 4: Forging watermarked images on StegaStamp; visual quality and decoding accuracy improve with m and ฮถ.
These attacks are more effective as model robustness increases, with empirical and certified robustness protocols intensifying identity leakage. Robust training inadvertently increases the decoderโs capacity to extract watermark information from residuals rather than confining it to the watermarked image alone.
Robustness in Neural Watermarking: Empirical and Certified Approaches
Classic post-processing watermarking (StegaStamp, HiDDeN) employs encoder-decoder neural architectures with loss terms for message reconstruction, visual fidelity (LPIPS), and adversarial regularization.
Robust watermarking introduces noise simulation layers or adversarial augmentations:
- Empirical Robustness (W-ER): Adversarial training and augmentation improve resilience but increase identity leakage.
- Certified Robustness (W-CR): The paper extends randomized smoothing techniques to watermark authenticationโproviding formal robustness guarantees against pixel and coordinate perturbations, including additive Gaussian noise and affine transformations.





Figure 6: Typical distortions for certified robustness: pixel noise, coordinate perturbations.
Certified robustness is achieved using smoothed classification bounds. The authentication model h is constrained such that prediction is invariant for perturbations within a certified radius R (determined by noise level and prediction confidence).



Figure 7: Certified accuracy at different radii under additive noiseโaccuracy decreases with radius but remains high within bounds.
Empirical evaluation demonstrates near-perfect certified accuracy (up to 99.5% or higher) under realistic noise levels, with minimal sacrifice in clean bit accuracy or visual quality.
To mitigate identity leakage, the paper formalizes a residual information bottleneck objective. The watermarked image should maximize mutual information with the secret watermark I(w;t), while minimizing mutual information between residual image and watermark I(z;t):
ฮถ0
Direct mutual information estimation is intractable; the authors employ variational bounds and KL-divergence approximation (โresidual information lossโ). Optimizing this objective via additional encoder training substantially reduces identity leakage without degrading robustness or authentication.
Figure 8: Information content in feature representations; residual information loss encourages maximal identity retention in watermark, minimal leakage in residual.
Figure 9: Schematic of residual information lossโmitigation pathway during robust watermark training.
Strong Empirical Findings and Contradictory Observations
- Identity leakage is strongly intensified by robust watermarking: Empirical and certified robust models show much higher silhouette scores (identity linking), forgery bit accuracy, and extraction accuracy compared to clean vanilla models, especially with StegaStamp.
- Residual information loss restores identity protection: Models trained with this loss achieve leakage rates comparable to, or lower than, clean models, while maintaining high robustness.
- Certified robustness achieves high authentication accuracy: W-CR preserves ฮถ1 certified accuracy even under significant geometric and noise perturbations.
- Trade-off between robustness and identity protection can be balanced: The introduction of residual information loss does not impair robustness certification or clean accuracy.
Figure 10: Three-facet performance visualization: authenticity, robustness, identity protection across watermarking strategiesโW-IR achieves superior balance.
Figure 11: Impact of ฮถ2 (number of images) and ฮถ3 (overlay multiplier) on COCO (StegaStamp)โidentity leakage scales but can be attenuated via information bottleneck training.
Practical and Theoretical Implications
The explicit demonstration of robust watermarking exacerbating identity leakage challenges conventional assumptions in watermark authentication security.
- For forensic and copyright applications: Invisible watermarks deployed with robust training are vulnerable to forgery and extractionโeven in black-box deployment.
- For watermarking system design: Information bottleneck objectives must be integrated to confine watermark information within the intended image and prevent residual-based leakage.
- For adversarial attack evaluation: Certified robustness certificates must include identity leakage analysis to validate both security and privacy.
Methodologically, the adaptation of randomized smoothing for coordinate-level perturbations contributes to robust watermark certification beyond pixel adversaries.
Future Directions in Secure AI Watermarking
Theoretical extensions should address adversaries with more powerful generative or optimization-based capabilities, including attacks exploiting semantic correlation or multimodal watermark embeddings. Further, the interplay of watermark information, image semantics, and generative model outputs in in-processing watermarking suggests new attack vectors requiring robust information-theoretic mitigation.
Integration with generative AI ecosystems, large-scale deployment, and regulatory compliance (copyright, evidence provenance) demands scalable, efficient, and provably secure watermarking frameworksโcombining robustness certification with formal leakage mitigation.
Conclusion
The paper provides a comprehensive analytical framework for understanding and mitigating identity leakage in robust post-processing image watermarking. By formalizing and empirically validating the leakage phenomenonโespecially under robust trainingโand presenting an effective residual information bottleneck mitigation, it establishes new foundational standards for secure watermarking practices in generative AI and digital content protection.