Investigating Deep Watermark Security: An Adversarial Transferability Perspective (2402.16397v1)
Abstract: The rise of generative neural networks has triggered an increased demand for intellectual property (IP) protection in generated content. Deep watermarking techniques, recognized for their flexibility in IP protection, have garnered significant attention. However, the surge in adversarial transferable attacks poses unprecedented challenges to the security of deep watermarking techniques-an area currently lacking systematic investigation. This study fills this gap by introducing two effective transferable attackers to assess the vulnerability of deep watermarks against erasure and tampering risks. Specifically, we initially define the concept of local sample density, utilizing it to deduce theorems on the consistency of model outputs. Upon discovering that perturbing samples towards high sample density regions (HSDR) of the target class enhances targeted adversarial transferability, we propose the Easy Sample Selection (ESS) mechanism and the Easy Sample Matching Attack (ESMA) method. Additionally, we propose the Bottleneck Enhanced Mixup (BEM) that integrates information bottleneck theory to reduce the generator's dependence on irrelevant noise. Experiments show a significant enhancement in the success rate of targeted transfer attacks for both ESMA and BEM-ESMA methods. We further conduct a comprehensive evaluation using ESMA and BEM-ESMA as measurements, considering model architecture and watermark encoding length, and achieve some impressive findings.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 10 674–10 685. [Online]. Available: https://doi.org/10.1109/CVPR52688.2022.01042
- J. Chen, Y. Wu, S. Luo, E. Xie, S. Paul, P. Luo, H. Zhao, and Z. Li, “Pixart-δ𝛿\deltaitalic_δ: Fast and controllable image generation with latent consistency models,” CoRR, vol. abs/2401.05252, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.05252
- J. Zhang, D. Chen, J. Liao, W. Zhang, H. Feng, G. Hua, and N. Yu, “Deep model intellectual property protection via deep watermarking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4005–4020, 2022. [Online]. Available: https://doi.org/10.1109/TPAMI.2021.3064850
- P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon, “The stable signature: Rooting watermarks in latent diffusion models,” CoRR, vol. abs/2303.15435, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.15435
- J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei, “Hidden: Hiding data with deep networks,” in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, ser. Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11219. Springer, 2018, pp. 682–697. [Online]. Available: https://doi.org/10.1007/978-3-030-01267-0_40
- N. Inkawhich, W. Wen, H. H. Li, and Y. Chen, “Feature space perturbations yield more transferable adversarial examples,” in CVPR, 2019.
- Y. Zhu, J. Sun, and Z. Li, “Rethinking adversarial transferability from a data distribution perspective,” in ICLR, 2022. [Online]. Available: https://openreview.net/forum?id=gVRhIEajG1k
- M. Li, C. Deng, T. Li, J. Yan, X. Gao, and H. Huang, “Towards transferable targeted attack,” in CVPR, 2020. [Online]. Available: https://openaccess.thecvf.com/content_CVPR_2020/html/Li_Towards_Transferable_Targeted_Attack_CVPR_2020_paper.html
- Z. Zhao, Z. Liu, and M. A. Larson, “On success and simplicity: A second look at transferable targeted attacks,” in NeurIPS, 2021. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/30d454f09b771b9f65e3eaf6e00fa7bd-Abstract.html
- Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Dong_Boosting_Adversarial_Attacks_CVPR_2018_paper.html
- J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft, “Nesterov accelerated gradient and scale invariance for adversarial attacks,” in ICLR, 2020. [Online]. Available: https://openreview.net/forum?id=SJlHwkBYDH
- B. Li, Y. Liu, and X. Wang, “Gradient harmonized single-stage detector,” in AAAI, 2019. [Online]. Available: https://doi.org/10.1609/aaai.v33i01.33018577
- N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
- H. Zhang, M. Cissé, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=r1Ddp1-Rb
- H. Fang, Y. Qiu, K. Chen, J. Zhang, W. Zhang, and E. Chang, “Flow-based robust watermarking with invertible noise layer for black-box distortions,” in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, B. Williams, Y. Chen, and J. Neville, Eds. AAAI Press, 2023, pp. 5054–5061. [Online]. Available: https://doi.org/10.1609/aaai.v37i4.25633
- J. Gao, B. Qi, Y. Li, Z. Guo, D. Li, Y. Xing, and D. Zhang, “Perturbation towards easy samples improves targeted adversarial transferability,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- B. Qi, B. Zhou, W. Zhang, J. Liu, and L. Wu, “Improving robustness of intent detection under adversarial attacks: A geometric constraint perspective,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in ACM, 2017. [Online]. Available: https://doi.org/10.1145/3052973.3053009
- A. Ilyas, L. Engstrom, A. Athalye, and J. Lin, “Black-box adversarial attacks with limited queries and information,” in ICML, 2018. [Online]. Available: http://proceedings.mlr.press/v80/ilyas18a.html
- J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Trans. Evol. Comput., vol. 23, no. 5, pp. 828–841, 2019. [Online]. Available: https://doi.org/10.1109/TEVC.2019.2890858
- Y. Dong, T. Pang, H. Su, and J. Zhu, “Evading defenses to transferable adversarial examples by translation-invariant attacks,” in CVPR, 2019. [Online]. Available: http://openaccess.thecvf.com/content_CVPR_2019/html/Dong_Evading_Defenses_to_Transferable_Adversarial_Examples_by_Translation-Invariant_Attacks_CVPR_2019_paper.html
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015. [Online]. Available: http://arxiv.org/abs/1412.6572
- N. Carlini and D. A. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE S&P, 2017. [Online]. Available: https://doi.org/10.1109/SP.2017.49
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018. [Online]. Available: https://openreview.net/forum?id=rJzIBfZAb
- J. Zou, Z. Pan, J. Qiu, X. Liu, T. Rui, and W. Li, “Improving the transferability of adversarial examples with resized-diverse-inputs, diversity-ensemble and region fitting,” in ECCV, 2020. [Online]. Available: https://doi.org/10.1007/978-3-030-58542-6_34
- N. Inkawhich, K. J. Liang, L. Carin, and Y. Chen, “Transferable perturbations of deep feature distributions,” in ICLR, 2020. [Online]. Available: https://openreview.net/forum?id=rJxAo2VYwr
- N. Inkawhich, K. J. Liang, B. Wang, M. Inkawhich, L. Carin, and Y. Chen, “Perturbing across the feature hierarchy to improve standard and strict blackbox attack transferability,” in NeurIPS, 2020. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/eefc7bfe8fd6e2c8c01aa6ca7b1aab1a-Abstract.html
- C. Zhang, P. Benz, A. Karjauv, J. Cho, K. Zhang, and I. S. Kweon, “Investigating top-k white-box and transferable black-box attack,” in CVPR, 2022. [Online]. Available: https://doi.org/10.1109/CVPR52688.2022.01466
- Z. Qin, Y. Fan, Y. Liu, L. Shen, Y. Zhang, J. Wang, and B. Wu, “Boosting the transferability of adversarial attacks with reverse adversarial perturbation,” in NeurIPS, 2022.
- M. Naseer, S. H. Khan, M. Hayat, F. S. Khan, and F. Porikli, “On generating transferable targeted perturbations,” in ICCV, 2021. [Online]. Available: https://doi.org/10.1109/ICCV48922.2021.00761
- O. Poursaeed, I. Katsman, B. Gao, and S. J. Belongie, “Generative adversarial perturbations,” in CVPR, 2018. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Poursaeed_Generative_Adversarial_Perturbations_CVPR_2018_paper.html
- K. R. Mopuri, U. Ojha, U. Garg, and R. V. Babu, “NAG: network for adversary generation,” in CVPR, 2018. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Mopuri_NAG_Network_for_CVPR_2018_paper.html
- M. Naseer, S. H. Khan, M. H. Khan, F. S. Khan, and F. Porikli, “Cross-domain transferability of adversarial perturbations,” in NeurIPS, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/99cd3843754d20ec3c5885d805db8a32-Abstract.html
- Y. Uchida, Y. Nagai, S. Sakazawa, and S. Satoh, “Embedding watermarks into deep neural networks,” in Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR 2017, Bucharest, Romania, June 6-9, 2017, B. Ionescu, N. Sebe, J. Feng, M. A. Larson, R. Lienhart, and C. Snoek, Eds. ACM, 2017, pp. 269–277. [Online]. Available: https://doi.org/10.1145/3078971.3078974
- T. Wang and F. Kerschbaum, “RIGA: covert and robust white-box watermarking of deep neural networks,” in WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, J. Leskovec, M. Grobelnik, M. Najork, J. Tang, and L. Zia, Eds. ACM / IW3C2, 2021, pp. 993–1004. [Online]. Available: https://doi.org/10.1145/3442381.3450000
- J. Zhang, D. Chen, J. Liao, H. Fang, W. Zhang, W. Zhou, H. Cui, and N. Yu, “Model watermarking for image processing networks,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 2020, pp. 12 805–12 812. [Online]. Available: https://doi.org/10.1609/aaai.v34i07.6976
- H. Chen, B. D. Rouhani, and F. Koushanfar, “Blackmarks: Blackbox multibit watermarking for deep neural networks,” CoRR, vol. abs/1904.00344, 2019. [Online]. Available: http://arxiv.org/abs/1904.00344
- Y. Quan, H. Teng, Y. Chen, and H. Ji, “Watermarking deep neural networks in image processing,” IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 5, pp. 1852–1865, 2021. [Online]. Available: https://doi.org/10.1109/TNNLS.2020.2991378
- OpenAI, “GPT-4 technical report,” CoRR, vol. abs/2303.08774, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.08774
- A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with CLIP latents,” CoRR, vol. abs/2204.06125, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2204.06125
- M. Ahmadi, A. Norouzi, N. Karimi, S. Samavi, and A. Emami, “Redmark: Framework for residual diffusion watermarking based on deep networks,” Expert Syst. Appl., vol. 146, p. 113157, 2020. [Online]. Available: https://doi.org/10.1016/j.eswa.2019.113157
- Y. Liu, M. Guo, J. Zhang, Y. Zhu, and X. Xie, “A novel two-stage separable deep learning framework for practical blind watermarking,” in Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019, L. Amsaleg, B. Huet, M. A. Larson, G. Gravier, H. Hung, C. Ngo, and W. T. Ooi, Eds. ACM, 2019, pp. 1509–1517. [Online]. Available: https://doi.org/10.1145/3343031.3351025
- Z. Jia, H. Fang, and W. Zhang, “MBRS: enhancing robustness of dnn-based watermarking by mini-batch of real and simulated JPEG compression,” in MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, H. T. Shen, Y. Zhuang, J. R. Smith, Y. Yang, P. César, F. Metze, and B. Prabhakaran, Eds. ACM, 2021, pp. 41–49. [Online]. Available: https://doi.org/10.1145/3474085.3475324
- R. Ma, M. Guo, Y. Hou, F. Yang, Y. Li, H. Jia, and X. Xie, “Towards blind watermarking: Combining invertible and non-invertible mechanisms,” in MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, J. Magalhães, A. D. Bimbo, S. Satoh, N. Sebe, X. Alameda-Pineda, Q. Jin, V. Oria, and L. Toni, Eds. ACM, 2022, pp. 1532–1542. [Online]. Available: https://doi.org/10.1145/3503161.3547950
- H. Wu, G. Liu, Y. Yao, and X. Zhang, “Watermarking neural networks with watermarked images,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 7, pp. 2591–2601, 2021. [Online]. Available: https://doi.org/10.1109/TCSVT.2020.3030671
- J. Fei, Z. Xia, B. Tondi, and M. Barni, “Supervised GAN watermarking for intellectual property protection,” in IEEE International Workshop on Information Forensics and Security, WIFS 2022, Shanghai, China, December 12-16, 2022. IEEE, 2022, pp. 1–6. [Online]. Available: https://doi.org/10.1109/WIFS55849.2022.9975409
- T. Qiao, Y. Ma, N. Zheng, H. Wu, Y. Chen, M. Xu, and X. Luo, “A novel model watermarking for protecting generative adversarial network,” Comput. Secur., vol. 127, p. 103102, 2023. [Online]. Available: https://doi.org/10.1016/j.cose.2023.103102
- A. Shrivastava, A. Gupta, and R. B. Girshick, “Training region-based object detectors with online hard example mining,” in CVPR, 2016. [Online]. Available: https://doi.org/10.1109/CVPR.2016.89
- X. Wang, A. Shrivastava, and A. Gupta, “A-fast-rcnn: Hard positive generation via adversary for object detection,” in CVPR, 2017. [Online]. Available: https://doi.org/10.1109/CVPR.2017.324
- T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in ICCV, 2017. [Online]. Available: https://doi.org/10.1109/ICCV.2017.324
- A. Sinha, H. Namkoong, and J. C. Duchi, “Certifying some distributional robustness with principled adversarial training,” in ICLR, 2018. [Online]. Available: https://openreview.net/forum?id=Hk6kPgZA-
- M. Nouiehed, M. Sanjabi, T. Huang, J. D. Lee, and M. Razaviyayn, “Solving a class of non-convex min-max games using iterative first order methods,” in NeurIPS, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/25048eb6a33209cb5a815bff0cf6887c-Abstract.html
- Y. Wang, X. Ma, J. Bailey, J. Yi, B. Zhou, and Q. Gu, “On the convergence and robustness of adversarial training,” in ICML, 2019. [Online]. Available: http://proceedings.mlr.press/v97/wang19i.html
- H. Karimi, J. Nutini, and M. Schmidt, “Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,” in ECML/PKDD, 2016. [Online]. Available: https://doi.org/10.1007/978-3-319-46128-1_50
- S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, “Gradient descent finds global minima of deep neural networks,” in ICML, 2019. [Online]. Available: http://proceedings.mlr.press/v97/du19c.html
- Z. Allen-Zhu, Y. Li, and Z. Song, “A convergence theory for deep learning via over-parameterization,” in ICML, 2019. [Online]. Available: http://proceedings.mlr.press/v97/allen-zhu19a.html
- B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in CVPR, 2016. [Online]. Available: https://doi.org/10.1109/CVPR.2016.319
- G. E. Hinton and S. T. Roweis, “Stochastic neighbor embedding,” in NeurIPS, S. Becker, S. Thrun, and K. Obermayer, Eds., 2002. [Online]. Available: https://proceedings.neurips.cc/paper/2002/hash/6150ccc6069bea6b5716254057a194ef-Abstract.html
- R. B. Girshick, “Fast R-CNN,” in ICCV, 2015. [Online]. Available: https://doi.org/10.1109/ICCV.2015.169
- N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 ieee information theory workshop (itw). IEEE, 2015, pp. 1–5.
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016. [Online]. Available: https://doi.org/10.1109/CVPR.2016.90
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015. [Online]. Available: http://arxiv.org/abs/1409.1556
- G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in CVPR, 2017. [Online]. Available: https://doi.org/10.1109/CVPR.2017.243
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in CVPR, 2016. [Online]. Available: https://doi.org/10.1109/CVPR.2016.308
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in AAAI, 2017. [Online]. Available: http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14806
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021.
- L. Gao, Q. Zhang, X. Zhu, J. Song, and H. T. Shen, “Staircase sign method for boosting adversarial attacks,” CoRR, vol. abs/2104.09722, 2021.
- Z. Wei, J. Chen, Z. Wu, and Y. Jiang, “Enhancing the self-universality for transferable targeted attacks,” in CVPR, 2023.
- Y. Long, Q. Zhang, B. Zeng, L. Gao, X. Liu, J. Zhang, and J. Song, “Frequency domain model augmentation for adversarial attack,” in ECCV, 2022.
- X. Yang, Y. Dong, T. Pang, H. Su, and J. Zhu, “Boosting transferability of targeted adversarial examples via hierarchical generative networks,” in ECCV, S. Avidan, G. J. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds., 2022.
- T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, ser. Lecture Notes in Computer Science, D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., vol. 8693. Springer, 2014, pp. 740–755. [Online]. Available: https://doi.org/10.1007/978-3-319-10602-1_48