Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness across Diverse Tasks (2404.08894v2)
Abstract: Low-rank adaptation (LoRA) has shifted the paradigm of adapting pre-trained Vision Transformers (ViT), achieving great efficiency by updating only a subset of tailored parameters to approximate weight updates. However, the multi-head design of the self-attention mechanism, with the heads working in parallel in the computation flow, exhibiting similar visual patterns and requiring update over all of them, incurs unnecessary storage and computational overhead. In this paper, we propose Head-level responsiveness tuning for low-rank adaptation (Heart-LoRA). The proposed method explores redundancy among the heads and selectively activates task-responsive heads, thus enabling fine-grained head-level tuning. Additionally, given the different responsiveness of heads to diverse visual tasks, our proposed method dynamically activates a subset of the approximated heads that are tailored to the current task. Experimental results show that Heart-LoRA yields superior performance over state-of-the-art PETL approaches on visual adaptation benchmark datasets.
- “On attention redundancy: A comprehensive study” In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: human language technologies, 2021, pp. 930–945
- Lukas Bossard, Matthieu Guillaumin and Luc Van Gool “Food-101–mining discriminative components with random forests” In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, 2014, pp. 446–461 Springer
- “Adaptformer: Adapting vision transformers for scalable visual recognition” In Advances in Neural Information Processing Systems 35, 2022, pp. 16664–16678
- “What does bert look at? an analysis of bert’s attention” In arXiv preprint arXiv:1906.04341, 2019
- “Imagenet: A large-scale hierarchical image database” In 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255 Ieee
- “An image is worth 16x16 words: Transformers for image recognition at scale” In arXiv preprint arXiv:2010.11929, 2020
- Akash Sunil Gaikwad and Mohamed El-Sharkawy “Pruning convolution neural network (squeezenet) using taylor expansion-based criterion” In 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2018, pp. 1–5 IEEE
- “Sparseadapter: An easy approach for improving the parameter-efficiency of adapters” In arXiv preprint arXiv:2210.04284, 2022
- “Adc: Automated deep compression and acceleration with reinforcement learning” In arXiv preprint arXiv:1802.03494 2 CoRR, 2018
- “Parameter-efficient transfer learning for NLP” In International conference on machine learning, 2019, pp. 2790–2799 PMLR
- “Lora: Low-rank adaptation of large language models” In arXiv preprint arXiv:2106.09685, 2021
- “Visual prompt tuning” In European Conference on Computer Vision, 2022, pp. 709–727 Springer
- “Convolutional bypasses are better vision transformer adapters” In arXiv preprint arXiv:2207.07039, 2022
- “Fact: Factor-tuning for lightweight adaptation on vision transformer” In Proceedings of the AAAI Conference on Artificial Intelligence 37.1, 2023, pp. 1060–1068
- Shibo Jie, Haoqing Wang and Zhi-Hong Deng “Revisiting the parameter efficiency of adapters from the perspective of precision redundancy” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17217–17226
- “Revealing the dark secrets of BERT” In arXiv preprint arXiv:1908.08593, 2019
- “3d object representations for fine-grained categorization” In Proceedings of the IEEE international conference on computer vision workshops, 2013, pp. 554–561
- “A fast post-training pruning framework for transformers” In Advances in Neural Information Processing Systems 35, 2022, pp. 24101–24116
- “Fast convnets using group-wise brain damage” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2554–2564
- “Swin transformer: Hierarchical vision transformer using shifted windows” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022
- “Prompt distribution learning” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5206–5215
- “Towards efficient visual adaption via structural re-parameterization” In arXiv preprint arXiv:2302.08106, 2023
- Jian-Hao Luo, Jianxin Wu and Weiyao Lin “Thinet: A filter level pruning method for deep neural network compression” In Proceedings of the IEEE international conference on computer vision, 2017, pp. 5058–5066
- “Fine-grained visual classification of aircraft” In arXiv preprint arXiv:1306.5151, 2013
- Paul Michel, Omer Levy and Graham Neubig “Are sixteen heads really better than one?” In Advances in neural information processing systems 32, 2019
- “Importance estimation for neural network pruning” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11264–11272
- “Pruning convolutional neural networks for resource efficient inference” In arXiv preprint arXiv:1611.06440, 2016
- “A visual vocabulary for flower classification” In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) 2, 2006, pp. 1447–1454 IEEE
- “Cats and dogs” In 2012 IEEE conference on computer vision and pattern recognition, 2012, pp. 3498–3505 IEEE
- “Adapterfusion: Non-destructive task composition for transfer learning” In arXiv preprint arXiv:2005.00247, 2020
- Sylvestre-Alvise Rebuffi, Hakan Bilen and Andrea Vedaldi “Learning multiple visual domains with residual adapters” In Advances in neural information processing systems 30, 2017
- “Multitask vision-language prompt tuning” In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 5656–5667
- “Analyzing the structure of attention in a transformer language model” In arXiv preprint arXiv:1906.04284, 2019
- “Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned” In arXiv preprint arXiv:1905.09418, 2019
- “Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers” In arXiv preprint arXiv:1802.00124, 2018
- “Nisp: Pruning networks using neuron importance score propagation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9194–9203
- Elad Ben Zaken, Shauli Ravfogel and Yoav Goldberg “Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models” In arXiv preprint arXiv:2106.10199, 2021
- “Unified vision and language prompt learning” In arXiv preprint arXiv:2210.07225, 2022
- “A large-scale study of representation learning with the visual task adaptation benchmark” In arXiv preprint arXiv:1910.04867, 2019
- Yuanhan Zhang, Kaiyang Zhou and Ziwei Liu “Neural prompt search” In arXiv preprint arXiv:2206.04673, 2022
- “Conditional prompt learning for vision-language models” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16816–16825
- “Learning to prompt for vision-language models” In International Journal of Computer Vision 130.9 Springer, 2022, pp. 2337–2348