Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space (2404.00230v3)
Abstract: Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression, while watermarks with superior robustness usually significantly damage image quality. This dilemma stems from the traditional paradigm where watermarks are injected and detected in pixel space, relying on pixel perturbation for watermark detection and resilience against attacks. In this paper, we highlight that an effective solution to the problem is to both inject and detect watermarks in the latent diffusion space, and propose Latent Watermark with a progressive training strategy. It weakens the direct connection between quality and robustness and thus alleviates their contradiction. We conduct evaluations on two datasets and against 10 watermark attacks. Six metrics measure the image quality and watermark robustness. Results show that compared to the recently proposed methods such as StableSignature, StegaStamp, RoSteALS, LaWa, TreeRing, and DiffuseTrace, LW not only surpasses them in terms of robustness but also offers superior image quality. Our code will be available at https://github.com/RichardSunnyMeng/LatentWatermark.
- Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 6840–6851, 2020.
- Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
- Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning (ICML), pages 8162–8171, 2021.
- Pseudo numerical methods for diffusion models on manifolds. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- Diffusion models beat GANs on image synthesis. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 8780–8794, 2021.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the International Conference on Machine Learning (ICML), pages 16784–16804, 2022.
- Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10696–10706, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
- LoRA: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, 2023.
- DiffFit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4230–4239, 2023.
- Midjourney. https://www.midjourney.com/home/, 2022.
- Wukong. https://xihe.mindspore.cn/modelzoo/wukong, 2022.
- HuggingFace. https://huggingface.co/.
- Identifying and mitigating the security risks of generative AI. Foundations and Trends in Privacy and Security (FTPS), 6(1):1–52, 2023.
- CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8695–8704, 2020.
- Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision (ECCV), pages 86–103, 2020.
- An information theoretic approach for attention-driven face forgery detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 111–127, 2022.
- DIRE for diffusion-generated image detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22445–22455, 2023a.
- What makes fake images detectable? Understanding properties that generalize. In Proceedings of the European Conference on Computer Vision (ECCV), pages 103–120, 2020.
- On the detection of synthetic images generated by diffusion models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
- Detecting images generated by deep diffusion models using their local intrinsic dimensionality. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 448–459, 2023.
- Towards the detection of diffusion model deepfakes. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- Ashif Raja. Active and passive detection of image forgery: A review analysis. International Journal of Engineering Research and Technology (IJERT), 9(5):418–424, 2021.
- SepMark: Deep separable watermarking for unified source tracing and deepfake detection. In Proceedings of the 31st ACM International Conference on Multimedia (MM), page 1190–1201, 2023.
- Watermarking images in self-supervised latent spaces. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3054–3058, 2022.
- Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2126, 2020.
- Flexible and secure watermarking for latent diffusion model. In Proceedings of the 31st ACM International Conference on Multimedia (MM), pages 1668–1676, 2023.
- Stable Messenger: Steganography for message-concealed image generation. arXiv preprint arXiv:2312.01284, 2023.
- RoSteALS: Robust steganography using autoencoder latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 933–942, 2023.
- How to detect unauthorized data usages in text-to-image diffusion models. arXiv preprint arXiv:2307.03108, 2023b.
- The stable signature: Rooting watermarks in latent diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22466–22477, 2023.
- A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137, 2023a.
- Tree-Ring watermarks: Fingerprints for diffusion images that are invisible and robust. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Generative autoencoders as watermark attackers: Analyses of vulnerabilities and threats. In Proceedings of the International Conference on Machine Learning Workshop (ICMLW), 2023b.
- Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), pages 740–755, 2014.
- A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(9):4555–4576, 2021.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018.
- Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the Fourth International Workshop on Quality of Multimedia Experience (QoMEX), pages 37–38, 2012.
- Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing (TIP), 13(4):600–612, 2004.
- Laion-5b: An open large-scale dataset for training next generation image-text models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 35, pages 25278–25294, 2022.
- ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
- U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 234–241, 2015.
- Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
- Digital watermarking and steganography. Morgan Kaufmann, 2007.
- Robust invisible video watermarking with attention. arXiv preprint arXiv:1909.01285, 2019.
- Variational image compression with a scale hyperprior. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
- Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7939–7948, 2020.
- Yixiong Chen. X-iqe: explainable image quality evaluation for text-to-image generation with visual large language models. arXiv preprint arXiv:2305.10843, 2023.
- Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14448–14457, 2021.
- Quantifying the carbon emissions of machine learning. In Proceedings of the Advances in Neural Information Processing Systems Workshop (NeurIPSW), 2019.
- Making a "completely blind" image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), pages 2256–2265, 2015.
- Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on Image Processing (TIP), 16(8):2080–2095, 2007.
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics (TACL), 2:67–78, 2014.
- Blind image quality evaluation using perception based features. In IEEE National Conference on Communications (NCC), pages 1–6, 2015.
- The mir flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (ICMIR), pages 39–43, 2008.