Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space (2404.00230v3)

Published 30 Mar 2024 in cs.CV

Abstract: Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression, while watermarks with superior robustness usually significantly damage image quality. This dilemma stems from the traditional paradigm where watermarks are injected and detected in pixel space, relying on pixel perturbation for watermark detection and resilience against attacks. In this paper, we highlight that an effective solution to the problem is to both inject and detect watermarks in the latent diffusion space, and propose Latent Watermark with a progressive training strategy. It weakens the direct connection between quality and robustness and thus alleviates their contradiction. We conduct evaluations on two datasets and against 10 watermark attacks. Six metrics measure the image quality and watermark robustness. Results show that compared to the recently proposed methods such as StableSignature, StegaStamp, RoSteALS, LaWa, TreeRing, and DiffuseTrace, LW not only surpasses them in terms of robustness but also offers superior image quality. Our code will be available at https://github.com/RichardSunnyMeng/LatentWatermark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 6840–6851, 2020.
  2. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
  3. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning (ICML), pages 8162–8171, 2021.
  4. Pseudo numerical methods for diffusion models on manifolds. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
  5. Diffusion models beat GANs on image synthesis. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 8780–8794, 2021.
  6. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the International Conference on Machine Learning (ICML), pages 16784–16804, 2022.
  7. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10696–10706, 2022.
  8. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  9. LoRA: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
  10. DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, 2023.
  11. DiffFit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4230–4239, 2023.
  12. Midjourney. https://www.midjourney.com/home/, 2022.
  13. Wukong. https://xihe.mindspore.cn/modelzoo/wukong, 2022.
  14. HuggingFace. https://huggingface.co/.
  15. Identifying and mitigating the security risks of generative AI. Foundations and Trends in Privacy and Security (FTPS), 6(1):1–52, 2023.
  16. CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8695–8704, 2020.
  17. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision (ECCV), pages 86–103, 2020.
  18. An information theoretic approach for attention-driven face forgery detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 111–127, 2022.
  19. DIRE for diffusion-generated image detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22445–22455, 2023a.
  20. What makes fake images detectable? Understanding properties that generalize. In Proceedings of the European Conference on Computer Vision (ECCV), pages 103–120, 2020.
  21. On the detection of synthetic images generated by diffusion models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
  22. Detecting images generated by deep diffusion models using their local intrinsic dimensionality. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 448–459, 2023.
  23. Towards the detection of diffusion model deepfakes. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  24. Ashif Raja. Active and passive detection of image forgery: A review analysis. International Journal of Engineering Research and Technology (IJERT), 9(5):418–424, 2021.
  25. SepMark: Deep separable watermarking for unified source tracing and deepfake detection. In Proceedings of the 31st ACM International Conference on Multimedia (MM), page 1190–1201, 2023.
  26. Watermarking images in self-supervised latent spaces. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3054–3058, 2022.
  27. Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2126, 2020.
  28. Flexible and secure watermarking for latent diffusion model. In Proceedings of the 31st ACM International Conference on Multimedia (MM), pages 1668–1676, 2023.
  29. Stable Messenger: Steganography for message-concealed image generation. arXiv preprint arXiv:2312.01284, 2023.
  30. RoSteALS: Robust steganography using autoencoder latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 933–942, 2023.
  31. How to detect unauthorized data usages in text-to-image diffusion models. arXiv preprint arXiv:2307.03108, 2023b.
  32. The stable signature: Rooting watermarks in latent diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22466–22477, 2023.
  33. A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137, 2023a.
  34. Tree-Ring watermarks: Fingerprints for diffusion images that are invisible and robust. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2023.
  35. Generative autoencoders as watermark attackers: Analyses of vulnerabilities and threats. In Proceedings of the International Conference on Machine Learning Workshop (ICMLW), 2023b.
  36. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), pages 740–755, 2014.
  37. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(9):4555–4576, 2021.
  38. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018.
  39. Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the Fourth International Workshop on Quality of Multimedia Experience (QoMEX), pages 37–38, 2012.
  40. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing (TIP), 13(4):600–612, 2004.
  41. Laion-5b: An open large-scale dataset for training next generation image-text models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 35, pages 25278–25294, 2022.
  42. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009.
  43. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 234–241, 2015.
  44. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
  45. Digital watermarking and steganography. Morgan Kaufmann, 2007.
  46. Robust invisible video watermarking with attention. arXiv preprint arXiv:1909.01285, 2019.
  47. Variational image compression with a scale hyperprior. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
  48. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7939–7948, 2020.
  49. Yixiong Chen. X-iqe: explainable image quality evaluation for text-to-image generation with visual large language models. arXiv preprint arXiv:2305.10843, 2023.
  50. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14448–14457, 2021.
  51. Quantifying the carbon emissions of machine learning. In Proceedings of the Advances in Neural Information Processing Systems Workshop (NeurIPSW), 2019.
  52. Making a "completely blind" image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013.
  53. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), pages 2256–2265, 2015.
  54. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on Image Processing (TIP), 16(8):2080–2095, 2007.
  55. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics (TACL), 2:67–78, 2014.
  56. Blind image quality evaluation using perception based features. In IEEE National Conference on Communications (NCC), pages 1–6, 2015.
  57. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval (ICMIR), pages 39–43, 2008.
Citations (4)

Summary

  • The paper introduces Latent Watermark (LW), a novel method that injects and detects watermarks in the latent space, decoupling robustness from pixel-level quality.
  • The paper employs a progressive training strategy to achieve nearly 100% identification and above 97% attribution performance across multiple attack scenarios.
  • The paper's comparative experiments demonstrate that LW outperforms existing methods like StegaStamp and RoSteALS, offering a practical solution for digital content protection.

The paper, "Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space," introduces an innovative approach to watermarking images generated by latent diffusion models. Traditional watermarking faces a trade-off between image quality and watermark robustness because detection typically occurs in pixel space, directly impacting the image quality.

Key Contributions

  1. Latent Space Watermarking: The authors propose a method called Latent Watermark (LW), which both injects and detects watermarks within the latent space of diffusion models. This approach decouples the watermark's robustness from the image's pixel-level quality.
  2. Progressive Training Strategy: LW uses a progressive training strategy to enhance the effectiveness of the watermarking process. This technique allows the watermarking model to gradually learn to integrate the watermark in a more robust manner.
  3. Comparative Performance: The paper conducts thorough experiments comparing LW with existing methods such as StegaStamp, StableSignature, RoSteALS, and TreeRing. LW demonstrates superior performance in terms of both robustness and image quality.
  4. Robustness and Quality: When embedding 64-bit messages, LW achieves nearly 100% identification performance and above 97% attribution performance across various attack scenarios. These scenarios include nine single-attack conditions and one comprehensive all-attack scenario, highlighting LW's resilience and effectiveness.
  5. Practical Implications: The method allows for improved watermarking without compromising on the visual quality of images generated by diffusion models, making it a practical solution for real-world applications in content attribution and copyright management.

The authors also indicate plans to release their code on GitHub, which could facilitate further research and application development in digital watermarking technologies. This contribution is particularly relevant for industries relying on digital content creation and protection against unauthorized use.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com