Papers
Topics
Authors
Recent
Search
2000 character limit reached

FullLoRA: Efficiently Boosting the Robustness of Pretrained Vision Transformers

Published 3 Jan 2024 in cs.CV | (2401.01752v2)

Abstract: In recent years, the Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks, and the robustness of the model has received increasing attention. However, existing large models tend to prioritize performance during training, potentially neglecting the robustness, which may lead to serious security concerns. In this paper, we establish a new challenge: exploring how to use a small number of additional parameters for adversarial finetuning to quickly and effectively enhance the adversarial robustness of a standardly trained model. To address this challenge, we develop novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module, which helps mitigate magnitude differences in parameters between the adversarial and standard training paradigms. Furthermore, we propose the FullLoRA framework by integrating the learnable LNLoRA modules into all key components of ViT-based models while keeping the pretrained model frozen, which can significantly improve the model robustness via adversarial finetuning in a parameter-efficient manner. Extensive experiments on several datasets demonstrate the superiority of our proposed FullLoRA framework. It achieves comparable robustness with full finetuning while only requiring about 5\% of the learnable parameters. This also effectively addresses concerns regarding extra model storage space and enormous training time caused by adversarial finetuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Vivit: A video vision transformer. In ICCV, pages 6816–6826, 2021.
  2. End-to-end object detection with transformers. In ECCV, pages 213–229, 2020.
  3. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, pages 39–57, 2017.
  4. Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR, 2015.
  5. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  6. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI, 40(4):834–848, 2018.
  7. Masked-attention mask transformer for universal image segmentation. In CVPR, pages 1280–1289, 2022.
  8. Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$. In ICLR, 2020a.
  9. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, pages 2206–2216, 2020b.
  10. A light recipe to train robust vision transformers. CoRR, abs/2209.07399, 2022.
  11. Arcface: Additive angular margin loss for deep face recognition. In CVPR, pages 4690–4699, 2019.
  12. Boosting adversarial attacks with momentum. In CVPR, pages 9185–9193, 2018.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  14. You only look at one sequence: Rethinking transformer in vision through object detection. In NeurIPS, pages 26183–26197, 2021.
  15. Patch-fool: Are vision transformers always robust against adversarial perturbations? In ICLR, 2022.
  16. Explaining and harnessing adversarial examples. In ICLR, 2015.
  17. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  18. Parameter-efficient transfer learning for NLP. In ICML, pages 2790–2799, 2019.
  19. Fastai: A layered API for deep learning. Inf., 11(2):108, 2020.
  20. Lora: Low-rank adaptation of large language models. In ICLR, 2022.
  21. Densely connected convolutional networks. In CVPR, pages 2261–2269, 2017.
  22. Semask: Semantically masked transformers for semantic segmentation. arXiv preprint arXiv:2112.12782, 2021.
  23. Visual prompt tuning. In ECCV, pages 709–727, 2022.
  24. Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
  25. Imagenet classification with deep convolutional neural networks. In NeurIPS, pages 1106–1114, 2012.
  26. Adversarial machine learning at scale. In ICLR, 2017.
  27. Connecting the dots: Detecting adversarial perturbations using context inconsistency. In ECCV, pages 396–413, 2020.
  28. Prefix-tuning: Optimizing continuous prompts for generation. In ACL, pages 4582–4597, 2021.
  29. Swinir: Image restoration using swin transformer. In ICCVW, pages 1833–1844, 2021.
  30. Sphereface: Deep hypersphere embedding for face recognition. In CVPR, pages 6738–6746, 2017.
  31. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 9992–10002, 2021.
  32. Give me your attention: Dot-product attention considered harmful for adversarial patch robustness. In CVPR, pages 15213–15222, 2022.
  33. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
  34. Compacter: Efficient low-rank hypercomplex adapter layers. In NeurIPS, pages 1022–1035, 2021.
  35. Unipelt: A unified framework for parameter-efficient language model tuning. In ACL, pages 6253–6264, 2022.
  36. When adversarial training meets vision transformers: Recipes from training to architecture. In NeurIPS, 2022.
  37. Diffusion models for adversarial purification. In ICML, pages 16805–16827, 2022.
  38. Revisiting adapters with adversarial training. In ICLR, 2023.
  39. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
  40. Imagenet large scale visual recognition challenge. IJCV, 115(3):211–252, 2015.
  41. Adversarial training for free! In NeurIPS, pages 3353–3364, 2019.
  42. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  43. Revisiting adversarial training for imagenet: Architectures, training and generalization across threat models. CoRR, abs/2303.01870, 2023.
  44. Segmenter: Transformer for semantic segmentation. In ICCV, pages 7242–7252, 2021.
  45. VL-ADAPTER: parameter-efficient transfer learning for vision-and-language tasks. In CVPR, pages 5217–5227, 2022.
  46. Intriguing properties of neural networks. In ICLR, 2014.
  47. Rethinking the inception architecture for computer vision. In CVPR, pages 2818–2826, 2016.
  48. Training data-efficient image transformers & distillation through attention. In ICML, pages 10347–10357, 2021.
  49. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023.
  50. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
  51. Cosface: Large margin cosine loss for deep face recognition. In CVPR, pages 5265–5274, 2018.
  52. Parameter-efficient tuning of large-scale multimodal foundation model. In NeurIPS, 2023.
  53. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV, pages 548–558, 2021.
  54. Improving adversarial robustness requires revisiting misclassified examples. In ICLR, 2020.
  55. Bridged transformer for vision and point cloud 3d object detection. In CVPR, pages 12104–12113, 2022.
  56. Fast is better than free: Revisiting adversarial training. In ICLR, 2020.
  57. Towards efficient adversarial training on vision transformers. In ECCV, pages 307–325, 2022.
  58. PSLT: A light-weight vision transformer with ladder self-attention and progressive shift. IEEE TPAMI, 45(9):11120–11135, 2023.
  59. Autolora: A parameter-free automated robust fine-tuning framework. CoRR, abs/2310.01818, 2023.
  60. Learning texture transformer network for image super-resolution. In CVPR, pages 5790–5799, 2020.
  61. Metaformer is actually what you need for vision. In CVPR, pages 10809–10819, 2022.
  62. Theoretically principled trade-off between robustness and accuracy. In ICML, pages 7472–7482, 2019.
  63. Adaptive budget allocation for parameter-efficient fine-tuning. In ICLR, 2023.
  64. Efficient mixed transformer for single image super-resolution. CoRR, abs/2305.11403, 2023.
  65. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR, pages 6881–6890, 2021.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.