Robust-Wide: Robust Watermarking against Instruction-driven Image Editing (2402.12688v3)
Abstract: Instruction-driven image editing allows users to quickly edit an image according to text instructions in a forward pass. Nevertheless, malicious users can easily exploit this technique to create fake images, which could cause a crisis of trust and harm the rights of the original image owners. Watermarking is a common solution to trace such malicious behavior. Unfortunately, instruction-driven image editing can significantly change the watermarked image at the semantic level, making current state-of-the-art watermarking methods ineffective. To remedy it, we propose Robust-Wide, the first robust watermarking methodology against instruction-driven image editing. Specifically, we follow the classic structure of deep robust watermarking, consisting of the encoder, noise layer, and decoder. To achieve robustness against semantic distortions, we introduce a novel Partial Instruction-driven Denoising Sampling Guidance (PIDSG) module, which consists of a large variety of instruction injections and substantial modifications of images at different semantic levels. With PIDSG, the encoder tends to embed the watermark into more robust and semantic-aware areas, which remains in existence even after severe image editing. Experiments demonstrate that Robust-Wide can effectively extract the watermark from the edited image with a low bit error rate of nearly 2.6% for 64-bit watermark messages. Meanwhile, it only induces a neglectable influence on the visual quality and editability of the original images. Moreover, Robust-Wide holds general robustness against different sampling configurations and other popular image editing methods such as ControlNet-InstructPix2Pix, MagicBrush, Inpainting, and DDIM Inversion. Codes and models are available at https://github.com/hurunyi/Robust-Wide.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 16784–16804. PMLR, 17–23 Jul 2022.
- Hierarchical text-conditional image generation with clip latents, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 36479–36494. Curran Associates, Inc., 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392–18402, June 2023.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
- Prompt-to-prompt image editing with cross-attention control. In The Eleventh International Conference on Learning Representations, 2022.
- Hive: Harnessing human feedback for instructional visual editing, 2023.
- Magicbrush: A manually annotated dataset for instruction-guided image editing. arXiv preprint arXiv:2306.10012, 2023.
- Guiding instruction-based image editing via multimodal large language models, 2023.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2022.
- Md Maklachur Rahman. A dwt, dct and svd based watermarking technique to protect the image piracy. International Journal of Managing Public Sector Information and Communication Technologies, 4(2):21, 2013.
- Hidden: Hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
- MBRS: Enhancing robustness of DNN-based watermarking by mini-batch of real and simulated JPEG compression. In Proceedings of the 29th ACM International Conference on Multimedia. ACM, oct 2021.
- Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2117–2126, 2020.
- Han Fang and et al. Pimog: An effective screen-shooting noise-layer simulation for deep-learning-based watermarking network. In ACM MM, pages 2267–2275, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
- Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Robust invisible video watermarking with attention. 2019.
- Light field messaging with deep photographic steganography. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1515–1524, 2019.
- Rihoop: Robust invisible hyperlinks in offline and online photographs. IEEE Transactions on Cybernetics, 52(7):7094–7106, 2020.
- SD invisible watermark. https://github.com/ShieldMnt/invisible-watermark.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Md. Maklachur Rahman. A DWT, DCT and SVD based watermarking technique to protect the image piracy. International Journal of Managing Public Sector Information and Communication Technologies, 4(2):21–32, jun 2013.
- Xinyu Li. Diffwa: Diffusion models for watermark attack, 2023.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022.
- Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6038–6047, June 2023.