Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing (2405.06340v1)
Abstract: This paper aims to enhance the transferability of adversarial samples in targeted attacks, where attack success rates remain comparatively low. To achieve this objective, we propose two distinct techniques for improving the targeted transferability from the loss and feature aspects. First, in previous approaches, logit calibrations used in targeted attacks primarily focus on the logit margin between the targeted class and the untargeted classes among samples, neglecting the standard deviation of the logit. In contrast, we introduce a new normalized logit calibration method that jointly considers the logit margin and the standard deviation of logits. This approach effectively calibrates the logits, enhancing the targeted transferability. Second, previous studies have demonstrated that mixing the features of clean samples during optimization can significantly increase transferability. Building upon this, we further investigate a truncated feature mixing method to reduce the impact of the source training model, resulting in additional improvements. The truncated feature is determined by removing the Rank-1 feature associated with the largest singular value decomposed from the high-level convolutional layers of the clean sample. Extensive experiments conducted on the ImageNet-Compatible and CIFAR-10 datasets demonstrate the individual and mutual benefits of our proposed two components, which outperform the state-of-the-art methods by a large margin in black-box targeted attacks.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” NeurIPS, 2015.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in ECCV, 2016, pp. 21–37.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015, pp. 234–241.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015.
- X. Chen, S. Wang, M. Long, and J. Wang, “Transferability vs. discriminability: Batch spectral penalization for adversarial domain adaptation,” in ICML, 2019, pp. 1081–1090.
- Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018, pp. 9185–9193.
- J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft, “Nesterov accelerated gradient and scale invariance for adversarial attacks,” in ICLR, 2020.
- C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” in CVPR, 2019, pp. 2730–2739.
- J. Byun, S. Cho, M.-J. Kwon, H.-S. Kim, and C. Kim, “Improving the transferability of targeted adversarial examples through object-based diverse input,” in CVPR, 2022, pp. 15 244–15 253.
- M. Li, C. Deng, T. Li, J. Yan, X. Gao, and H. Huang, “Towards transferable targeted attack,” in CVPR, 2020, pp. 641–649.
- Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks,” NeurIPS, pp. 6115–6128, 2021.
- J. Weng, Z. Luo, S. Li, N. Sebe, and Z. Zhong, “Logit margin matters: Improving transferable targeted adversarial attack by logit calibration,” IEEE Transactions on Information Forensics and Security, 2023.
- J. Byun, M.-J. Kwon, S. Cho, Y. Kim, and C. Kim, “Introducing competition to boost the transferability of targeted adversarial examples through clean feature mixup,” in CVPR, 2023, pp. 24 648–24 657.
- M. B. Muhammad and M. Yeasin, “Eigen-cam: Class activation map using principal components,” in IJCNN, 2020, pp. 1–7.
- Y. Song, N. Sebe, and W. Wang, “Rankfeat: Rank-1 feature removal for out-of-distribution detection,” NeurIPS, pp. 17 885–17 898, 2022.
- Y. Dong, T. Pang, H. Su, and J. Zhu, “Evading defenses to transferable adversarial examples by translation-invariant attacks,” in CVPR, 2019, pp. 4312–4321.
- J. Weng, Z. Luo, Z. Zhong, D. Lin, and S. Li, “Exploring non-target knowledge for improving ensemble universal adversarial attacks,” in AAAI, 2023, pp. 2768–2775.
- A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Artificial intelligence safety and security, 2018, pp. 99–112.
- X. Wang and K. He, “Enhancing the transferability of adversarial attacks through variance tuning,” in CVPR, 2021, pp. 1924–1933.
- J. Zou, Z. Pan, J. Qiu, X. Liu, T. Rui, and W. Li, “Improving the transferability of adversarial examples with resized-diverse-inputs, diversity-ensemble and region fitting,” in ECCV, 2020, pp. 563–579.
- X. Wang, X. He, J. Wang, and K. He, “Admix: Enhancing the transferability of adversarial attacks,” in ICCV, 2021, pp. 16 158–16 167.
- Y. Long, Q. Zhang, B. Zeng, L. Gao, X. Liu, J. Zhang, and J. Song, “Frequency domain model augmentation for adversarial attack,” in ECCV, 2022, pp. 549–566.
- Q. Huang, I. Katsman, H. He, Z. Gu, S. Belongie, and S.-N. Lim, “Enhancing adversarial example transferability with an intermediate level attack,” in CVPR, 2019, pp. 4733–4742.
- N. Inkawhich, K. J. Liang, L. Carin, and Y. Chen, “Transferable perturbations of deep feature distributions,” arXiv preprint arXiv:2004.12519, 2020.
- N. Inkawhich, K. Liang, B. Wang, M. Inkawhich, L. Carin, and Y. Chen, “Perturbing across the feature hierarchy to improve standard and strict blackbox attack transferability,” NeruIPS, vol. 33, pp. 20 791–20 801, 2020.
- L. Gao, Y. Cheng, Q. Zhang, X. Xu, and J. Song, “Feature space targeted attacks by statistic alignment,” arXiv preprint arXiv:2105.11645, 2021.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Master’s thesis, Department of Computer Science, University of Toronto, 2009.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze, “Levit: a vision transformer in convnet’s clothing for faster inference,” in ICCV, 2021, pp. 12 259–12 269.
- S. d’Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, and L. Sagun, “Convit: Improving vision transformers with soft convolutional inductive biases,” in ICML, 2021, pp. 2286–2296.
- X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” NeurIPS, pp. 9355–9366, 2021.
- B. Heo, S. Yun, D. Han, S. Chun, J. Choe, and S. J. Oh, “Rethinking spatial dimensions of vision transformers,” in CVPR, 2021, pp. 11 936–11 945.
- T. Pang, K. Xu, C. Du, N. Chen, and J. Zhu, “Improving adversarial robustness via promoting ensemble diversity,” in ICML, 2019, pp. 4970–4979.
- S. Kariyappa and M. K. Qureshi, “Improving adversarial robustness of ensembles with diversity training,” arXiv preprint arXiv:1901.09981, 2019.
- H. Yang, J. Zhang, H. Dong, N. Inkawhich, A. Gardner, A. Touchet, W. Wilkes, H. Berry, and H. Li, “Dverge: diversifying vulnerabilities for enhanced robust generation of ensembles,” NeurIPS, pp. 5505–5515, 2020.