A Transfer Attack to Image Watermarks (2403.15365v3)
Abstract: Watermark has been widely deployed by industry to detect AI-generated images. The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature. However, the robustness in the no-box setting is much less understood. In this work, we propose a new transfer evasion attack to image watermark in the no-box setting. Our transfer attack adds a perturbation to a watermarked image to evade multiple surrogate watermarking models trained by the attacker itself, and the perturbed watermarked image also evades the target watermarking model. Our major contribution is to show that, both theoretically and empirically, watermark-based AI-generated image detector is not robust to evasion attacks even if the attacker does not have access to the watermarking model nor the detection API.
- Robust image watermarking based on multiband wavelets and empirical mode decomposition. IEEE Transactions on Image Processing, 2007.
- Hidden: Hiding data with deep networks. In European Conference on Computer Vision, 2018.
- Stegastamp: Invisible hyperlinks in physical photographs. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- Udh: Universal deep hiding for steganography, watermarking, and light field messaging. In Advances in Neural Information Processing Systems, 2020.
- Ali Al-Haj. Combined dwt-dct digital image watermarking. Journal of computer science, 2007.
- Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In IEEE Symposium on Security and Privacy, 2021.
- A watermark for large language models. In International Conference on Machine Learning, 2023.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, 2022.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, 2021.
- Robin Rombach. Stable diffusion watermark decoder. \urlhttps://github.com/CompVis/stable-diffusion/blob/main/scripts/tests/test_watermark.py, 2022.
- Evading watermark based detection of ai-generated content. In ACM Conference on Computer and Communications Security, 2023.
- Benchmarking the robustness of image watermarks. arXiv preprint arXiv:2401.08573, 2024.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- S. Pereira and T. Pun. Robust template matching for affine resistant image watermarks. IEEE Transactions on Image Processing, 2000.
- Efficient general print-scanning resilient data hiding based on uniform log-polar mapping. IEEE Transactions on Information Forensics and Security, 2010.
- Increasing the capturing angle in print-cam robust watermarking. Journal of Systems and Software, 2018.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Rethinking model ensemble in transfer-based adversarial attacks. arXiv preprint arXiv:2303.09105, 2023.
- Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.
- An introduction to the bootstrap. CRC press, 1994.
- DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. In Annual Meeting of the Association for Computational Linguistics, 2023.
- Midjourney user prompts & generated images (250k). \urlhttps://www.kaggle.com/ds/2349267, 2022.
- DALLE2 Images. \urlhttps://dalle2.gallery, 2023.
- Qingquan Wang. Invisible watermark. \urlhttps://github.com/ShieldMnt/invisible-watermark, 2020.
- Yuepeng Hu (14 papers)
- Zhengyuan Jiang (12 papers)
- Moyang Guo (7 papers)
- Neil Gong (14 papers)