Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective (2208.09602v2)
Abstract: The Vision Transformer has emerged as a powerful tool for image classification tasks, surpassing the performance of convolutional neural networks (CNNs). Recently, many researchers have attempted to understand the robustness of Transformers against adversarial attacks. However, previous researches have focused solely on perturbations in the spatial domain. This paper proposes an additional perspective that explores the adversarial robustness of Transformers against frequency-selective perturbations in the spectral domain. To facilitate comparison between these two domains, an attack framework is formulated as a flexible tool for implementing attacks on images in the spatial and spectral domains. The experiments reveal that Transformers rely more on phase and low frequency information, which can render them more vulnerable to frequency-selective attacks than CNNs. This work offers new insights into the properties and adversarial robustness of Transformers.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: transformers for image recognition at scale,” in Proceedings of the 9th International Conference on Learning Representations, 2021.
- N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning in computer vision: A survey,” IEEE Access, vol. 6, pp. 14 410–14 430, 2018.
- M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang, “Intriguing properties of vision transformers,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 23 296–23 308.
- P. Benz, S. Ham, C. Zhang, A. Karjauv, and I. S. Kweon, “Adversarial robustness comparison of vision transformer and MLP-mixer to CNNs,” in Proceedings of the 32nd British Machine Vision Conference, 2021.
- R. Shao, Z. Shi, J. Yi, P.-Y. Chen, and C.-J. Hsieh, “On the adversarial robustness of vision transformers,” arXiv preprint arXiv:2103.15670, 2021.
- A. Aldahdooh, W. Hamidouche, and O. Deforges, “Reveal of vision transformers robustness against adversarial attacks,” arXiv preprint arXiv:2106.03734, 2021.
- Y. Bai, J. Mei, A. L. Yuille, and C. Xie, “Are transformers more robust than CNNs?” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 26 831–26 843.
- K. Mahmood, R. Mahmood, and M. Van Dijk, “On the robustness of vision transformers to adversarial examples,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7838–7847.
- S. Bhojanapalli, A. Chakrabarti, D. Glasner, D. Li, T. Unterthiner, and A. Veit, “Understanding robustness of transformers for image classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 231–10 241.
- N. Park and S. Kim, “How do vision transformers work?” in Proceedings of the 10th International Conference on Learning Representations, 2022.
- H. Wang, X. Wu, Z. Huang, and E. P. Xing, “High-frequency component helps explain the generalization of convolutional neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8684–8694.
- J. Jo and Y. Bengio, “Measuring the tendency of CNNs to learn surface statistical regularities,” arXiv preprint arXiv:1711.11561, 2017.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
- H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 10 347–10 357.
- L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z.-H. Jiang, F. E. Tay, J. Feng, and S. Yan, “Tokens-to-token ViT: Training vision transformers from scratch on ImageNet,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 558–567.
- W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 568–578.
- K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in transformer,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 15 908–15 919.
- A. Ali, H. Touvron, M. Caron, P. Bojanowski, M. Douze, A. Joulin, I. Laptev, N. Neverova, G. Synnaeve, J. Verbeek et al., “XCIT: Cross-covariance image transformers,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 20 014–20 027.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proceedings of the 3rd International Conference on Learning Representations, 2015.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Proceedings of the 6th International Conference on Learning Representations, 2018.
- N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Proceedings of the IEEE Symposium on Security and Privacy, 2017, pp. 39–57.
- X.-C. Li, X.-Y. Zhang, F. Yin, and C.-L. Liu, “F-mixup: Attack CNNs from Fourier perspective,” in Proceedings of the 25th International Conference on Pattern Recognition, 2020, pp. 541–548.
- C. Guo, J. S. Frank, and K. Q. Weinberger, “Low frequency adversarial perturbation,” in Proceedings of the Uncertainty in Artificial Intelligence, 2020, pp. 1127–1137.
- Y. Sharma, G. W. Ding, and M. A. Brubaker, “On the effectiveness of low frequency perturbations,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 3389–3396.
- R. Duan, Y. Chen, D. Niu, Y. Yang, A. K. Qin, and Y. He, “Advdrop: Adversarial attack to DNNs by dropping information,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7506–7515.
- A. Agarwal, N. Ratha, M. Vatsa, and R. Singh, “Crafting adversarial perturbations via transformed image component swapping,” IEEE Transactions on Image Processing, vol. 31, pp. 7338–7349, 2022.
- Z. Wen, “Fourier attack–a more efficient adversarial attack method,” in Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence, 2022, pp. 125–130.
- S. Paul and P.-Y. Chen, “Vision transformers are robust learners,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 2071–2081.
- J. Kim, J. Park, S. Kim, and J.-S. Lee, “Curved representation space of vision transformers,” arXiv preprint arXiv:2210.05742, 2022.
- A. Liu, S. Tang, S. Liang, R. Gong, B. Wu, X. Liu, and D. Tao, “Exploring the relationship between architectural design and adversarially robust generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4096–4107.
- Y. Fu, S. Zhang, S. Wu, C. Wan, and Y. Lin, “Patch-fool: Are vision transformers always robust against adversarial perturbations?” in Proceedings of the International Conference on Learning Representations, 2022.
- D. Karmon, D. Zoran, and Y. Goldberg, “Lavan: Localized and visible adversarial noise,” in Proceedings of International Conference on Machine Learning, 2018, pp. 2507–2515.
- J. Gu, V. Tresp, and Y. Qin, “Are vision transformers robust to patch perturbations?” in Proceedings of the European Conference on Computer Vision, 2022, pp. 404–421.
- R. Wightman, “NIPS 2017 adversarial competition (pytorch),” https://github.com/rwightman/pytorch-nips2017-adversarial, 2017.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
- R. Wightman, “PyTorch image models,” https://github.com/rwightman/pytorch-image-models, 2019.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proceedings of the Asilomar Conference on Signals, Systems & Computers, vol. 2, 2003, pp. 1398–1402.
- H. Z. Nafchi, A. Shahkolaei, R. Hedjam, and M. Cheriet, “Mean deviation similarity index: Efficient and reliable full-reference image quality evaluator,” IEEE Access, vol. 4, pp. 5579–5590, 2016.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
- G. Chen, P. Peng, L. Ma, J. Li, L. Du, and Y. Tian, “Amplitude-phase recombination: Rethinking robustness of convolutional neural networks in frequency domain,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 458–467.
- S. Kastryulin, D. Zakirov, and D. Prokopenko, “PyTorch Image Quality: Metrics and measure for image quality assessment,” 2019, open-source software available at https://github.com/photosynthesis-team/piq. [Online]. Available: https://github.com/photosynthesis-team/piq
- S. Abnar and W. Zuidema, “Quantifying attention flow in transformers,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 4190–4197.
- Gihyun Kim (1 paper)
- Juyeop Kim (7 papers)
- Jong-Seok Lee (48 papers)