Rethinking Model Ensemble in Transfer-based Adversarial Attacks (2303.09105v2)
Abstract: It is widely recognized that deep learning models lack robustness to adversarial examples. An intriguing property of adversarial examples is that they can transfer across different models, which enables black-box attacks without any knowledge of the victim model. An effective strategy to improve the transferability is attacking an ensemble of models. However, previous works simply average the outputs of different models, lacking an in-depth analysis on how and why model ensemble methods can strongly improve the transferability. In this paper, we rethink the ensemble in adversarial attacks and define the common weakness of model ensemble with two properties: 1) the flatness of loss landscape; and 2) the closeness to the local optimum of each model. We empirically and theoretically show that both properties are strongly correlated with the transferability and propose a Common Weakness Attack (CWA) to generate more transferable adversarial examples by promoting these two properties. Experimental results on both image classification and object detection tasks validate the effectiveness of our approach to improving the adversarial transferability, especially when attacking adversarially trained models. We also successfully apply our method to attack a black-box large vision-LLM -- Google's Bard, showing the practical effectiveness. Code is available at \url{https://github.com/huanranchen/AdversarialAttacks}.
- Improved svrg for non-strongly-convex or sum-of-non-convex objectives. In International Conference on Machine Learning, pp. 1080–1089, 2016.
- Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
- Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
- Bootstrap generalization ability from loss landscape perspective. In European Conference on Computer Vision, pp. 500–517, 2023.
- Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on Artificial Intelligence and Security, pp. 15–26, 2017.
- Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pp. 1310–1320, 2019.
- Robustbench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
- Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886–893, 2005.
- A light recipe to train robust vision transformers. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning, pp. 225–253, 2023.
- Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493, 2017.
- Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 9185–9193, 2018.
- Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4312–4321, 2019.
- How robust is google’s bard to adversarial image attacks? arXiv preprint arXiv:2309.11751, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
- Robustness (python library), 2019. URL https://github.com/MadryLab/robustness.
- Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2020.
- Patch-wise attack for fooling deep neural network. In European Conference on Computer Vision, pp. 307–322, 2020.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
- Countering adversarial images using input transformations. In International Conference on Learning Representations, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324, 2019.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4700–4708, 2017.
- T-sea: Transfer-based self-ensemble attack on object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20514–20523, 2023.
- Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
- Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
- Glenn Jocher. YOLOv5 by Ultralytics, May 2020. URL https://github.com/ultralytics/yolov5.
- Every local minimum value is the global minimum value of induced model in nonconvex machine learning. Neural Computation, pp. 2293–2323, 2019.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, pp. 84–90, 2017.
- Adversarial machine learning at scale. In International Conference on Learning Representations, 2016.
- Adversarial examples in the physical world. In Artificial Intelligence Safety and Security, pp. 99–112, 2018.
- Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pp. 5905–5914, 2021.
- Deep linear networks with arbitrary loss: All local minima are global. In International Conference on Machine Learning, pp. 2902–2907, 2018.
- Deep learning. Nature, 521(7553):436–444, 2015.
- Visualizing the loss landscape of neural nets. Advances in Neural Information Processing Systems, 2018.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- Learning transferable adversarial examples via ghost networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11458–11465, 2020.
- Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1787, 2018.
- Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2020.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, pp. 85–116, 2022a.
- Ssd: Single shot multibox detector. In European Conference on Computer Vision, pp. 21–37, 2016.
- Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations, 2017.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986, 2022b.
- Frequency domain model augmentation for adversarial attack. In European Conference on Computer Vision, pp. 549–566, 2022.
- Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision, pp. 116–131, 2018.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM International Conference on Multimedia, pp. 1485–1488, 2010.
- A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 262–271, 2020.
- On generating transferable targeted perturbations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7708–7717, 2021.
- On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
- Diffusion models for adversarial purification. In International Conference on Machine Learning, pp. 16805–16827, 2022.
- Boosting the transferability of adversarial attacks with reverse adversarial perturbation. Advances in Neural Information Processing Systems, pp. 29845–29858, 2022.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763, 2021.
- Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10428–10436, 2020.
- Yolo9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271, 2017.
- Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 2015.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision, pp. 211–252, 2015.
- Do adversarially robust imagenet models transfer better? In Advances in Neural Information Processing Systems, pp. 3533–3545, 2020.
- Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In ACM Sigsac Conference on Computer and Communications Security, pp. 1528–1540, 2016.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
- Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9, 2015.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pp. 6105–6114, 2019.
- Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828, 2019.
- Fooling automated surveillance cameras: adversarial patches to attack person detection. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition Workshops, 2019.
- Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.
- Normalized flat minima: Exploring scale invariant definition of flat minima for neural networks using pac-bayesian analysis. In International Conference on Machine Learning, pp. 9636–9647, 2020.
- Low-rank solutions of linear matrix equations via procrustes flow. In International Conference on Machine Learning, pp. 964–973, 2016.
- Maxvit: Multi-axis vision transformer. In European Conference on Computer Vision, pp. 459–479, 2022.
- Vladimir Vapnik. The nature of statistical learning theory. 1999.
- Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1924–1933, 2021.
- Boosting adversarial transferability through enhanced momentum. In British Machine Vision Conference, 2021.
- Cfa: Class-wise calibrated fair adversarial training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8193–8201, 2023a.
- Jailbreak and guard aligned language models with only few in-context demonstrations. arXiv preprint arXiv:2310.06387, 2023b.
- Sharpness-aware minimization alone can improve adversarial robustness. In The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023c.
- Fast is better than free: Revisiting adversarial training. In International Conference on Learning Representations, 2020.
- Towards understanding generalization of deep learning: Perspective of loss landscapes. arXiv preprint arXiv:1706.10239, 2017.
- Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
- Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2730–2739, 2019.
- Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14983–14992, 2022.
- Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. Advances in Neural Information Processing Systems, pp. 17642–17655, 2021.
- Tensorflow model garden. 2020. URL https://github. com/tensorflow/models, 2020.
- To make yourself invisible with adversarial semantic contours. Computer Vision and Image Understanding, pp. 103659, 2023.
- Huanran Chen (21 papers)
- Yichi Zhang (185 papers)
- Yinpeng Dong (103 papers)
- Xiao Yang (159 papers)
- Hang Su (225 papers)
- Jun Zhu (426 papers)