Leveraging Optimization for Adaptive Attacks on Image Watermarks (2309.16952v2)
Abstract: Untrustworthy users can misuse image generators to synthesize high-quality deepfakes and engage in unethical activities. Watermarking deters misuse by marking generated content with a hidden message, enabling its detection using a secret watermarking key. A core security property of watermarking is robustness, which states that an attacker can only evade detection by substantially degrading image quality. Assessing robustness requires designing an adaptive attack for the specific watermarking algorithm. When evaluating watermarking algorithms and their (adaptive) attacks, it is challenging to determine whether an adaptive attack is optimal, i.e., the best possible attack. We solve this problem by defining an objective function and then approach adaptive attacks as an optimization problem. The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating surrogate keys that are differentiable and can be used to optimize the attack's parameters. We demonstrate for Stable Diffusion models that such an attacker can break all five surveyed watermarking methods at no visible degradation in image quality. Optimizing our attacks is efficient and requires less than 1 GPU hour to reduce the detection accuracy to 6.3% or less. Our findings emphasize the need for more rigorous robustness testing against adaptive, learnable attackers.
- Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium (USENIX Security 18), pp. 1615–1631, 2018.
- Diffusion-based data augmentation for skin disease classification: Impact across original medical datasets to fully synthetic images. arXiv preprint arXiv:2301.04802, 2023.
- Certified neural network watermarks with randomized smoothing. In International Conference on Machine Learning, pp. 1450–1465. PMLR, 2022.
- Identifying and mitigating the security risks of generative ai. arXiv preprint arXiv:2308.14840, 2023.
- How relevant is the turing test in the age of sophisbots? IEEE Security & Privacy, 17(6):64–71, 2019.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18392–18402, 2023.
- Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pp. 3–14, 2017.
- Digital watermarking and steganography. Morgan kaufmann, 2007.
- Diffusionshield: A watermark for copyright protection against generative diffusion models. arXiv preprint arXiv:2306.04642, 2023.
- Dall e mini, 7 2021. URL https://github.com/borisdayma/dalle-mini.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Federal Register. Safe, secure, and trustworthy development and use of artificial intelligence. https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence, 2023. Accessed 4 Nov. 2023.
- The stable signature: Rooting watermarks in latent diffusion models. arXiv preprint arXiv:2303.15435, 2023.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Identifying ai-generated images with synthid, 2023. URL https://www.deepmind.com/blog/identifying-ai-generated-images-with-synthid. Accessed: 2023-09-23.
- The ethical need for watermarks in machine-generated language. arXiv preprint arXiv:2209.03118, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Evading watermark based detection of ai-generated content. arXiv preprint arXiv:2305.03807, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, 2014.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022.
- Ptw: Pivotal tuning watermarking for pre-trained image generators. 32nd USENIX Security Symposium, 2023.
- Sok: How robust is image classification deep neural network watermarking? In 2022 IEEE Symposium on Security and Privacy (SP), pp. 787–804. IEEE, 2022.
- The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR), 54(1):1–41, 2021.
- Protecting the intellectual property of diffusion models by the watermark diffusion process. arXiv preprint arXiv:2306.03436, 2023.
- On chatgpt and beyond: How generative artificial intelligence may affect research, teaching, and practice. International Journal of Research in Marketing, 2023.
- Sdxl: improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on international conference on multimedia retrieval, pp. 269–277, 2017.
- Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust. arXiv preprint arXiv:2305.20030, 2023.
- Responsible disclosure of generative models using scalable fingerprinting. arXiv preprint arXiv:2012.08726, 2020.
- Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In Proceedings of the IEEE/CVF International conference on computer vision, pp. 14448–14457, 2021.
- Robust invisible video watermarking with attention. arXiv preprint arXiv:1909.01285, 2019.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
- A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137, 2023.