DiffuseTrace: A Transparent and Flexible Watermarking Scheme for Latent Diffusion Model (2405.02696v2)
Abstract: Latent Diffusion Models (LDMs) enable a wide range of applications but raise ethical concerns regarding illegal utilization. Adding watermarks to generative model outputs is a vital technique employed for copyright tracking and mitigating potential risks associated with AI-generated contents. However, post-processed watermarking methods are unable to withstand generative watermark attacks and there exists a trade-off between image fidelity and watermark strength. Therefore, we propose a novel technique called DiffuseTrace. DiffuseTrace does not rely on fine-tuning of the diffusion model components. The multi-bit watermark is a embedded into the image space semantically without compromising image quality. The watermark component can be utilized as a plug-in in arbitrary diffusion models. We validate through experiments the effectiveness and flexibility of DiffuseTrace. Under 8 types of image processing watermark attacks and 3 types of generative watermark attacks, DiffuseTrace maintains watermark detection rate of 99% and attribution accuracy of over 94%.
- Ali Al-Haj. 2007. Combined DWT-DCT digital image watermarking. Journal of Computer Science 3, 9 (2007), 740–746.
- Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).
- Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1. In Proceedings of IEEE International Conference on Communications, Vol. 2. IEEE, 1064–1070.
- Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22563–22575.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18392–18402.
- The autoencoding variational autoencoder. Advances in Neural Information Processing Systems 33 (2020), 15077–15087.
- Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7939–7948.
- Digital watermarking and steganography. Morgan kaufmann.
- Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing 16, 8 (2007), 2080–2095.
- Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems 34 (2021), 8780–8794.
- Carl Doersch. 2016. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016).
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12873–12883.
- The stable signature: Rooting watermarks in latent diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22466–22477.
- Watermarking images in self-supervised latent spaces. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3054–3058.
- Leveraging frequency analysis for deep fake image recognition. In International conference on Machine Learning. PMLR, 3247–3258.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022).
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840–6851.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
- An optimized image watermarking method based on HD and SVD in DWT domain. IEEE Access 7 (2019), 80849–80860.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095 (2022).
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11461–11471.
- Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters 20, 3 (2012), 209–212.
- Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6038–6047.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
- Learning Transferable Visual Models From Natural Language Supervision. In ICML.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
- Zero-shot text-to-image generation. In International Conference on Machine Learning. Pmlr, 8821–8831.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.
- High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-assisted Intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings. 1–10.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 4 (2022), 4713–4726.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).
- Learning on gradients: Generalized artifacts representation for gan-generated images detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12105–12114.
- Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2117–2126.
- Blind image quality evaluation using perception based features. In 2015 Twenty First National Conference on Communications. IEEE, 1–6.
- CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF Conference on Computer vision and pattern recognition. 8695–8704.
- Dire for diffusion-generated image detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22445–22455.
- Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models. arXiv preprint arXiv:2210.14896 (2022).
- Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust. arXiv preprint arXiv:2305.20030 (2023).
- Flexible and secure watermarking for latent diffusion model. In Proceedings of the 31st ACM International Conference on Multimedia. 1668–1676.
- Robust invisible video watermarking with attention. arXiv preprint arXiv:1909.01285 (2019).
- Robust Image Watermarking using Stable Diffusion. arXiv preprint arXiv:2401.04247 (2024).
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595.
- Invisible image watermarks are provably removable using generative ai. Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, and Lei Li,“Invisible image watermarks are provably removable using generative ai,” Aug (2023).
- Generative autoencoders as watermark attackers: Analyses of vulnerabilities and threats. arXiv preprint arXiv:2306.01953 (2023).
- A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137 (2023).
- Hidden: Hiding data with deep networks. In Proceedings of the European Conference on Computer Vision. 657–672.