Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attacking Transformers with Feature Diversity Adversarial Perturbation (2403.07942v1)

Published 10 Mar 2024 in cs.CR and cs.CV

Abstract: Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on la bels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black box models, including most ViT variants, CNNs, and MLPs, even for models developed for other modalities. Our inspira tion comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features, causing the features in middle-to-end layers to become increasingly similar and eventually collapse. We propose the feature diversity attacker to naturally accelerate this process and achieve remarkable performance and transferability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Spectrally accurate causality enforcement using SVD-based Fourier continuations for high-speed digital interconnects. IEEE Transactions on Components, Packaging and Manufacturing Technology, 5(7): 991–1005.
  2. Toward transformer-based object detection. arXiv preprint arXiv:2012.09958.
  3. Adversarial patch. arXiv preprint arXiv:1712.09665.
  4. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, 213–229. Springer.
  5. The principle of diversity: Training stronger vision transformers calls for reducing all levels of redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12020–12030.
  6. Attention is not all you need: Pure attention loses rank doubly exponentially with depth. In International Conference on Machine Learning, 2793–2803. PMLR.
  7. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, 9185–9193.
  8. Patch-fool: Are vision transformers always robust against adversarial perturbations? arXiv preprint arXiv:2203.08392.
  9. Patch-wise attack for fooling deep neural network. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, 307–322. Springer.
  10. Vision transformers with patch diversification. arXiv preprint arXiv:2104.12753.
  11. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
  12. Gradvit: Gradient inversion of vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10021–10030.
  13. Huybrechs, D. 2010. On the Fourier extension of nonperiodic functions. SIAM Journal on Numerical Analysis, 47(6): 4326–4355.
  14. Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7066–7074.
  15. Kim, H. 2020. Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950.
  16. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.
  17. Nesterov accelerated gradient and scale invariance for adversarial attacks. arXiv preprint arXiv:1908.06281.
  18. Give me your attention: Dot-product attention considered harmful for adversarial patch robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15234–15243.
  19. On the robustness of vision transformers to adversarial examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 7838–7847.
  20. On improving adversarial transferability of vision transformers. arXiv preprint arXiv:2106.04169.
  21. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems, 34: 12116–12128.
  22. Imagenet large scale visual recognition challenge. International journal of computer vision, 115: 211–252.
  23. Scene Recognition for Visually-Impaired People’s Navigation Assistance Based on Vision Transformer with Dual Multiscale Attention. Mathematics, 11(5): 1127.
  24. Decision-based black-box attack against vision transformers via patch-wise adversarial removal. arXiv preprint arXiv:2112.03492.
  25. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, 7262–7272.
  26. Augmented shortcuts for vision transformers. Advances in Neural Information Processing Systems, 34: 15316–15327.
  27. Anti-oversmoothing in deep vision transformers via the fourier domain analysis: From theory to practice. arXiv preprint arXiv:2203.05962.
  28. Generating transferable adversarial examples against vision transformers. In Proceedings of the 30th ACM International Conference on Multimedia, 5181–5190.
  29. Towards transferable adversarial attacks on vision transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2668–2676.
  30. Boosting the transferability of adversarial samples via attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1161–1170.
  31. P2T: Pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  32. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2730–2739.
  33. Vitpose: Simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484.
  34. FG-UAP: Feature-Gathering Universal Adversarial Perturbation. arXiv preprint arXiv:2209.13113.
  35. Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16415–16424.
  36. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11656–11665.
  37. Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886.
  38. A unified efficient pyramid transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2667–2677.
  39. Toward Understanding and Boosting Adversarial Transferability From a Distribution Perspective. IEEE Transactions on Image Processing, 31: 6487–6501.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com