Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Random Position Adversarial Patch for Vision Transformers (2307.04066v1)

Published 9 Jul 2023 in cs.CV

Abstract: Previous studies have shown the vulnerability of vision transformers to adversarial patches, but these studies all rely on a critical assumption: the attack patches must be perfectly aligned with the patches used for linear projection in vision transformers. Due to this stringent requirement, deploying adversarial patches for vision transformers in the physical world becomes impractical, unlike their effectiveness on CNNs. This paper proposes a novel method for generating an adversarial patch (G-Patch) that overcomes the alignment constraint, allowing the patch to launch a targeted attack at any position within the field of view. Specifically, instead of directly optimizing the patch using gradients, we employ a GAN-like structure to generate the adversarial patch. Our experiments show the effectiveness of the adversarial patch in achieving universal attacks on vision transformers, both in digital and physical-world scenarios. Additionally, further analysis reveals that the generated adversarial patch exhibits robustness to brightness restriction, color transfer, and random noise. Real-world attack experiments validate the effectiveness of the G-Patch to launch robust attacks even under some very challenging conditions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Understanding robustness of transformers for image classification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10231–10241, 2021.
  2. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017.
  3. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 357–366, 2021.
  4. Visformer: The vision-friendly transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 589–598, 2021.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  6. Robust physical-world attacks on machine learning models. arXiv preprint arXiv:1707.08945, 2(3):4, 2017.
  7. Patch-fool: Are vision transformers always robust against adversarial perturbations? arXiv preprint arXiv:2203.08392, 2022.
  8. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  9. Levit: a vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12259–12269, 2021.
  10. Are vision transformers robust to patch perturbations? In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XII, pages 404–421. Springer, 2022.
  11. Transformer in transformer. Advances in Neural Information Processing Systems, 34:15908–15919, 2021.
  12. A few adversarial tokens can break vision transformers.
  13. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
  14. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  15. Towards deep learning models resistant to adversarial examples. arXiv preprint arXiv:1706.06083, 2017.
  16. Vision transformers are robust learners. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2071–2081, 2022.
  17. Understanding and improving robustness of vision transformers through patch-based negative augmentation. Advances in Neural Information Processing Systems, 35:16276–16289, 2022.
  18. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  19. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  20. Certified patch robustness via smoothed vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15137–15147, 2022.
  21. Mingzhen Shao. Brightness-restricted adversarial attack patch, 2023.
  22. On the adversarial robustness of vision transformers. arXiv preprint arXiv:2103.15670, 2021.
  23. Decision-based black-box attack against vision transformers via patch-wise adversarial removal. Advances in Neural Information Processing Systems, 35:12921–12933, 2022.
  24. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  25. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
  26. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  27. Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  28. Skip connections matter: On the transferability of adversarial examples generated with resnets. arXiv preprint arXiv:2002.05990, 2020.
  29. Making an invisibility cloak: Real world adversarial attacks on object detectors. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 1–17. Springer, 2020.
  30. Early convolutions help transformers see better. Advances in Neural Information Processing Systems, 34:30392–30400, 2021.
  31. Camou: Learning physical vehicle camouflages to adversarially attack detectors in the wild. In International Conference on Learning Representations, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.