Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Model Ensemble in Transfer-based Adversarial Attacks (2303.09105v2)

Published 16 Mar 2023 in cs.CV

Abstract: It is widely recognized that deep learning models lack robustness to adversarial examples. An intriguing property of adversarial examples is that they can transfer across different models, which enables black-box attacks without any knowledge of the victim model. An effective strategy to improve the transferability is attacking an ensemble of models. However, previous works simply average the outputs of different models, lacking an in-depth analysis on how and why model ensemble methods can strongly improve the transferability. In this paper, we rethink the ensemble in adversarial attacks and define the common weakness of model ensemble with two properties: 1) the flatness of loss landscape; and 2) the closeness to the local optimum of each model. We empirically and theoretically show that both properties are strongly correlated with the transferability and propose a Common Weakness Attack (CWA) to generate more transferable adversarial examples by promoting these two properties. Experimental results on both image classification and object detection tasks validate the effectiveness of our approach to improving the adversarial transferability, especially when attacking adversarially trained models. We also successfully apply our method to attack a black-box large vision-LLM -- Google's Bard, showing the practical effectiveness. Code is available at \url{https://github.com/huanranchen/AdversarialAttacks}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In International Conference on Machine Learning, pp. 1080–1089, 2016.
  2. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
  3. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
  4. Bootstrap generalization ability from loss landscape perspective. In European Conference on Computer Vision, pp.  500–517, 2023.
  5. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on Artificial Intelligence and Security, pp.  15–26, 2017.
  6. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pp. 1310–1320, 2019.
  7. Robustbench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
  8. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.  886–893, 2005.
  9. A light recipe to train robust vision transformers. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning, pp.  225–253, 2023.
  10. Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493, 2017.
  11. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.  9185–9193, 2018.
  12. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4312–4321, 2019.
  13. How robust is google’s bard to adversarial image attacks? arXiv preprint arXiv:2309.11751, 2023.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  15. Robustness (python library), 2019. URL https://github.com/MadryLab/robustness.
  16. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2020.
  17. Patch-wise attack for fooling deep neural network. In European Conference on Computer Vision, pp.  307–322, 2020.
  18. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
  19. Countering adversarial images using input transformations. In International Conference on Learning Representations, 2018.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016.
  21. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1314–1324, 2019.
  22. Densely connected convolutional networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.  4700–4708, 2017.
  23. T-sea: Transfer-based self-ensemble attack on object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  20514–20523, 2023.
  24. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
  25. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
  26. Glenn Jocher. YOLOv5 by Ultralytics, May 2020. URL https://github.com/ultralytics/yolov5.
  27. Every local minimum value is the global minimum value of induced model in nonconvex machine learning. Neural Computation, pp.  2293–2323, 2019.
  28. Imagenet classification with deep convolutional neural networks. Communications of the ACM, pp.  84–90, 2017.
  29. Adversarial machine learning at scale. In International Conference on Learning Representations, 2016.
  30. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security, pp.  99–112, 2018.
  31. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pp. 5905–5914, 2021.
  32. Deep linear networks with arbitrary loss: All local minima are global. In International Conference on Machine Learning, pp. 2902–2907, 2018.
  33. Deep learning. Nature, 521(7553):436–444, 2015.
  34. Visualizing the loss landscape of neural nets. Advances in Neural Information Processing Systems, 2018.
  35. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  36. Learning transferable adversarial examples via ghost networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  11458–11465, 2020.
  37. Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  1778–1787, 2018.
  38. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2020.
  39. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, pp.  85–116, 2022a.
  40. Ssd: Single shot multibox detector. In European Conference on Computer Vision, pp.  21–37, 2016.
  41. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations, 2017.
  42. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10012–10022, 2021.
  43. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11976–11986, 2022b.
  44. Frequency domain model augmentation for adversarial attack. In European Conference on Computer Vision, pp.  549–566, 2022.
  45. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision, pp.  116–131, 2018.
  46. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
  47. Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM International Conference on Multimedia, pp.  1485–1488, 2010.
  48. A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  262–271, 2020.
  49. On generating transferable targeted perturbations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  7708–7717, 2021.
  50. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
  51. Diffusion models for adversarial purification. In International Conference on Machine Learning, pp. 16805–16827, 2022.
  52. Boosting the transferability of adversarial attacks with reverse adversarial perturbation. Advances in Neural Information Processing Systems, pp. 29845–29858, 2022.
  53. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763, 2021.
  54. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10428–10436, 2020.
  55. Yolo9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  7263–7271, 2017.
  56. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  57. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 2015.
  58. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, pp.  211–252, 2015.
  59. Do adversarially robust imagenet models transfer better? In Advances in Neural Information Processing Systems, pp. 3533–3545, 2020.
  60. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In ACM Sigsac Conference on Computer and Communications Security, pp.  1528–1540, 2016.
  61. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  62. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
  63. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  1–9, 2015.
  64. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2818–2826, 2016.
  65. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pp. 6105–6114, 2019.
  66. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2820–2828, 2019.
  67. Fooling automated surveillance cameras: adversarial patches to attack person detection. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition Workshops, 2019.
  68. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.
  69. Normalized flat minima: Exploring scale invariant definition of flat minima for neural networks using pac-bayesian analysis. In International Conference on Machine Learning, pp. 9636–9647, 2020.
  70. Low-rank solutions of linear matrix equations via procrustes flow. In International Conference on Machine Learning, pp. 964–973, 2016.
  71. Maxvit: Multi-axis vision transformer. In European Conference on Computer Vision, pp.  459–479, 2022.
  72. Vladimir Vapnik. The nature of statistical learning theory. 1999.
  73. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1924–1933, 2021.
  74. Boosting adversarial transferability through enhanced momentum. In British Machine Vision Conference, 2021.
  75. Cfa: Class-wise calibrated fair adversarial training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8193–8201, 2023a.
  76. Jailbreak and guard aligned language models with only few in-context demonstrations. arXiv preprint arXiv:2310.06387, 2023b.
  77. Sharpness-aware minimization alone can improve adversarial robustness. In The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023c.
  78. Fast is better than free: Revisiting adversarial training. In International Conference on Learning Representations, 2020.
  79. Towards understanding generalization of deep learning: Perspective of loss landscapes. arXiv preprint arXiv:1706.10239, 2017.
  80. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
  81. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2730–2739, 2019.
  82. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14983–14992, 2022.
  83. Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. Advances in Neural Information Processing Systems, pp. 17642–17655, 2021.
  84. Tensorflow model garden. 2020. URL https://github. com/tensorflow/models, 2020.
  85. To make yourself invisible with adversarial semantic contours. Computer Vision and Image Understanding, pp.  103659, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Huanran Chen (21 papers)
  2. Yichi Zhang (184 papers)
  3. Yinpeng Dong (102 papers)
  4. Xiao Yang (158 papers)
  5. Hang Su (224 papers)
  6. Jun Zhu (424 papers)
Citations (39)

Summary

  • The paper presents a novel Common Weakness Attack (CWA) that targets ensemble weaknesses by focusing on loss landscape flatness and local optima.
  • The method integrates Sharpness Aware Minimization and Cosine Similarity Encourager, increasing attack success rates by up to 30%.
  • Empirical validation on 31 victim models, including adversarially trained and vision-language systems, demonstrates the approach's robust performance.

Rethinking Model Ensemble in Transfer-based Adversarial Attacks

This paper, authored by Chen et al., presents an in-depth analysis of model ensemble methods in the context of transfer-based adversarial attacks. It addresses a significant gap in the understanding of how and why model ensemble methods enhance transferability. The authors propose a novel attack method, termed Common Weakness Attack (CWA), which targets common weaknesses across models to generate more transferable adversarial examples.

Problem Statement and Motivation

Deep neural networks are known to be vulnerable to adversarial examples, which are inputs modified by subtle perturbations that can mislead the model's predictions. The transferability of these adversarial examples across different models can facilitate black-box attacks, where the adversary has no direct access to the victim model. Traditionally, ensemble methods improve transferability by averaging outputs from various models. However, there has been limited exploration of the underlying principles that make this approach effective.

Methodological Contributions

The key contribution is the introduction of the concept of common weaknesses in model ensembles. The authors propose two properties that characterize these weaknesses:

  1. Flatness of the Loss Landscape: A flatter loss landscape indicates better generalization and transferability.
  2. Closeness to Local Optima of Each Model: The proximity of the adversarial example to local optima of each surrogate model boosts transferability.

To optimize these properties, the authors present the Common Weakness Attack (CWA) by combining two sub-methods:

  • Sharpness Aware Minimization (SAM): Aimed at flattening the loss landscape to improve generalization.
  • Cosine Similarity Encourager (CSE): Encourages closeness to local optima by maximizing cosine similarity of gradients between models.

The integration of these methods into existing attacks, such as momentum iterative methods, results in enhanced adversarial transferability.

Empirical Validation

The effectiveness of CWA is validated through experiments on image classification, object detection, and an innovative test on a large vision-LLM. The method significantly improves attack success rates across 31 diverse victim models. Notably, against adversarially trained models and state-of-the-art defenses, CWA shows a marked increase in attack success rates—by as much as 30% in some cases. These results underscore the importance of targeting common weaknesses in adversarial attack strategies.

Implications and Future Directions

The proposed CWA method is robust across various tasks and models, highlighting its potential as a tool for evaluating model robustness. The insights gained from the paper of common weaknesses can inform the development of more resilient defense strategies. Moreover, the versatility of the CWA algorithm suggests its applicability in areas beyond adversarial attacks, potentially influencing the design of models with improved generalization capabilities.

In future research, the exploration of adaptive defense mechanisms that can identify and guard against attacks exploiting common weaknesses will be essential. Additionally, as AI systems are increasingly deployed in safety-critical applications, continual enhancements in understanding model vulnerabilities will remain a critical area of investigation. This work contributes a foundational understanding of ensemble methods in adversarial contexts, providing a basis for future advancements in both attack and defense strategies in machine learning.

Github Logo Streamline Icon: https://streamlinehq.com