Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness across Diverse Tasks (2404.08894v2)

Published 13 Apr 2024 in cs.CV and cs.LG

Abstract: Low-rank adaptation (LoRA) has shifted the paradigm of adapting pre-trained Vision Transformers (ViT), achieving great efficiency by updating only a subset of tailored parameters to approximate weight updates. However, the multi-head design of the self-attention mechanism, with the heads working in parallel in the computation flow, exhibiting similar visual patterns and requiring update over all of them, incurs unnecessary storage and computational overhead. In this paper, we propose Head-level responsiveness tuning for low-rank adaptation (Heart-LoRA). The proposed method explores redundancy among the heads and selectively activates task-responsive heads, thus enabling fine-grained head-level tuning. Additionally, given the different responsiveness of heads to diverse visual tasks, our proposed method dynamically activates a subset of the approximated heads that are tailored to the current task. Experimental results show that Heart-LoRA yields superior performance over state-of-the-art PETL approaches on visual adaptation benchmark datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. “On attention redundancy: A comprehensive study” In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: human language technologies, 2021, pp. 930–945
  2. Lukas Bossard, Matthieu Guillaumin and Luc Van Gool “Food-101–mining discriminative components with random forests” In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, 2014, pp. 446–461 Springer
  3. “Adaptformer: Adapting vision transformers for scalable visual recognition” In Advances in Neural Information Processing Systems 35, 2022, pp. 16664–16678
  4. “What does bert look at? an analysis of bert’s attention” In arXiv preprint arXiv:1906.04341, 2019
  5. “Imagenet: A large-scale hierarchical image database” In 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255 Ieee
  6. “An image is worth 16x16 words: Transformers for image recognition at scale” In arXiv preprint arXiv:2010.11929, 2020
  7. Akash Sunil Gaikwad and Mohamed El-Sharkawy “Pruning convolution neural network (squeezenet) using taylor expansion-based criterion” In 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2018, pp. 1–5 IEEE
  8. “Sparseadapter: An easy approach for improving the parameter-efficiency of adapters” In arXiv preprint arXiv:2210.04284, 2022
  9. “Adc: Automated deep compression and acceleration with reinforcement learning” In arXiv preprint arXiv:1802.03494 2 CoRR, 2018
  10. “Parameter-efficient transfer learning for NLP” In International conference on machine learning, 2019, pp. 2790–2799 PMLR
  11. “Lora: Low-rank adaptation of large language models” In arXiv preprint arXiv:2106.09685, 2021
  12. “Visual prompt tuning” In European Conference on Computer Vision, 2022, pp. 709–727 Springer
  13. “Convolutional bypasses are better vision transformer adapters” In arXiv preprint arXiv:2207.07039, 2022
  14. “Fact: Factor-tuning for lightweight adaptation on vision transformer” In Proceedings of the AAAI Conference on Artificial Intelligence 37.1, 2023, pp. 1060–1068
  15. Shibo Jie, Haoqing Wang and Zhi-Hong Deng “Revisiting the parameter efficiency of adapters from the perspective of precision redundancy” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17217–17226
  16. “Revealing the dark secrets of BERT” In arXiv preprint arXiv:1908.08593, 2019
  17. “3d object representations for fine-grained categorization” In Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 554–561
  18. “A fast post-training pruning framework for transformers” In Advances in Neural Information Processing Systems 35, 2022, pp. 24101–24116
  19. “Fast convnets using group-wise brain damage” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2554–2564
  20. “Swin transformer: Hierarchical vision transformer using shifted windows” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022
  21. “Prompt distribution learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5206–5215
  22. “Towards efficient visual adaption via structural re-parameterization” In arXiv preprint arXiv:2302.08106, 2023
  23. Jian-Hao Luo, Jianxin Wu and Weiyao Lin “Thinet: A filter level pruning method for deep neural network compression” In Proceedings of the IEEE international conference on computer vision, 2017, pp. 5058–5066
  24. “Fine-grained visual classification of aircraft” In arXiv preprint arXiv:1306.5151, 2013
  25. Paul Michel, Omer Levy and Graham Neubig “Are sixteen heads really better than one?” In Advances in neural information processing systems 32, 2019
  26. “Importance estimation for neural network pruning” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11264–11272
  27. “Pruning convolutional neural networks for resource efficient inference” In arXiv preprint arXiv:1611.06440, 2016
  28. “A visual vocabulary for flower classification” In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) 2, 2006, pp. 1447–1454 IEEE
  29. “Cats and dogs” In 2012 IEEE conference on computer vision and pattern recognition, 2012, pp. 3498–3505 IEEE
  30. “Adapterfusion: Non-destructive task composition for transfer learning” In arXiv preprint arXiv:2005.00247, 2020
  31. Sylvestre-Alvise Rebuffi, Hakan Bilen and Andrea Vedaldi “Learning multiple visual domains with residual adapters” In Advances in neural information processing systems 30, 2017
  32. “Multitask vision-language prompt tuning” In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 5656–5667
  33. “Analyzing the structure of attention in a transformer language model” In arXiv preprint arXiv:1906.04284, 2019
  34. “Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned” In arXiv preprint arXiv:1905.09418, 2019
  35. “Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers” In arXiv preprint arXiv:1802.00124, 2018
  36. “Nisp: Pruning networks using neuron importance score propagation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9194–9203
  37. Elad Ben Zaken, Shauli Ravfogel and Yoav Goldberg “Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models” In arXiv preprint arXiv:2106.10199, 2021
  38. “Unified vision and language prompt learning” In arXiv preprint arXiv:2210.07225, 2022
  39. “A large-scale study of representation learning with the visual task adaptation benchmark” In arXiv preprint arXiv:1910.04867, 2019
  40. Yuanhan Zhang, Kaiyang Zhou and Ziwei Liu “Neural prompt search” In arXiv preprint arXiv:2206.04673, 2022
  41. “Conditional prompt learning for vision-language models” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16816–16825
  42. “Learning to prompt for vision-language models” In International Journal of Computer Vision 130.9 Springer, 2022, pp. 2337–2348

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com