Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Asymmetry in Low-Rank Adapters of Foundation Models (2402.16842v2)

Published 26 Feb 2024 in cs.LG

Abstract: Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically, when updating the parameter matrices of a neural network by adding a product $BA$, we observe that the $B$ and $A$ matrices have distinct functions: $A$ extracts features from the input, while $B$ uses these features to create the desired output. Based on this observation, we demonstrate that fine-tuning $B$ is inherently more effective than fine-tuning $A$, and that a random untrained $A$ should perform nearly as well as a fine-tuned one. Using an information-theoretic lens, we also bound the generalization of low-rank adapters, showing that the parameter savings of exclusively training $B$ improves the bound. We support our conclusions with experiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.acl-long.568. URL http://dx.doi.org/10.18653/v1/2021.acl-long.568.
  2. Prilora: Pruned and rank-increasing low-rank adaptation. 2024. URL https://api.semanticscholar.org/CorpusID:267068991.
  3. A thorough examination of the cnn/daily mail reading comprehension task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2016. doi: 10.18653/v1/p16-1223. URL http://dx.doi.org/10.18653/v1/P16-1223.
  4. Qlora: Efficient finetuning of quantized llms. ArXiv, abs/2305.14314, 2023. URL https://api.semanticscholar.org/CorpusID:258841328.
  5. An image is worth 16x16 words: Transformers for image recognition at scale, 2020.
  6. The lottery ticket hypothesis: Finding sparse, trainable neural networks, 2018.
  7. A survey of quantization methods for efficient neural network inference, 2021.
  8. In search of lost domain generalization, 2020.
  9. Lq-lora: Low-rank plus quantized matrix decomposition for efficient language model finetuning, 2024.
  10. Svdiff: Compact parameter space for diffusion fine-tuning. arXiv preprint arXiv:2303.11305, 2023.
  11. Measuring massive multitask language understanding, 2020.
  12. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685, 2021. URL https://api.semanticscholar.org/CorpusID:235458009.
  13. HuggingFace. Peft. https://github.com/huggingface/peft, Year.
  14. Nola: Networks as linear combination of low rank random basis, 2023.
  15. Vera: Vector-based random matrix adaptation, 2024.
  16. Similarity of neural network representations revisited. In International conference on machine learning, pp.  3519–3529. PMLR, 2019.
  17. Fine-tuning can distort pretrained features and underperform out-of-distribution, 2022.
  18. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.acl-main.703. URL http://dx.doi.org/10.18653/v1/2020.acl-main.703.
  19. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647, 2023.
  20. Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.  74–81, 2004.
  21. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  22. Parameter-efficient orthogonal finetuning via butterfly factorization. In ICLR, 2024.
  23. Roberta: A robustly optimized bert pretraining approach, 2019.
  24. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018. doi: 10.18653/v1/d18-1206. URL http://dx.doi.org/10.18653/v1/D18-1206.
  25. What’s hidden in a randomly weighted neural network? In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2020. doi: 10.1109/cvpr42600.2020.01191. URL http://dx.doi.org/10.1109/CVPR42600.2020.01191.
  26. Matrix correlation. Psychometrika, 49(3):403–423, 1984.
  27. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  28. Llama 2: Open foundation and fine-tuned chat models, 2023.
  29. Vershynin, R. High-dimensional probability. University of California, Irvine, 2020.
  30. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, 2018. doi: 10.18653/v1/w18-5446. URL http://dx.doi.org/10.18653/v1/W18-5446.
  31. How far can camels go? exploring the state of instruction tuning on open resources. arXiv preprint arXiv:2306.04751, 2023.
  32. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  33. Information-theoretic analysis of generalization capability of learning algorithms. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  34. Compeft: Compression for communicating parameter efficient updates via sparsification and quantization. arXiv preprint arXiv:2311.13171, 2023.
  35. The expressive power of low-rank adaptation, 2023.
  36. Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning, 2023a.
  37. Adalora: Adaptive budget allocation for parameter-efficient fine-tuning, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jiacheng Zhu (54 papers)
  2. Kristjan Greenewald (65 papers)
  3. Kimia Nadjahi (13 papers)
  4. Haitz Sáez de Ocáriz Borde (26 papers)
  5. Rickard Brüel Gabrielsson (4 papers)
  6. Leshem Choshen (78 papers)
  7. Marzyeh Ghassemi (96 papers)
  8. Mikhail Yurochkin (68 papers)
  9. Justin Solomon (86 papers)
Citations (18)