Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA (2312.03732v1)

Published 28 Nov 2023 in cs.CL and cs.LG

Abstract: As LLMs have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning, 2020.
  2. On the opportunities and risks of foundation models. CoRR, abs/2108.07258, 2021. URL https://arxiv.org/abs/2108.07258.
  3. Training verifiers to solve math word problems, 2021.
  4. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models, 2022.
  5. Krona: Parameter efficient tuning with kronecker adapter, 2022.
  6. Parameter-efficient transfer learning for NLP. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  2790–2799. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/houlsby19a.html.
  7. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  8. Fedpara: Low-rank hadamard product for communication-efficient federated learning, 2023.
  9. Measuring the intrinsic dimension of objective landscapes, 2018.
  10. Code as policies: Language model programs for embodied control, 2023.
  11. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, 2022.
  12. Decoupled weight decay regularization, 2019.
  13. Orca: Progressive learning from complex explanation traces of gpt-4, 2023.
  14. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  15. Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine, 4(1):86, 2021.
  16. Adafactor: Adaptive learning rates with sublinear memory cost, 2018.
  17. Llama 2: Open foundation and fine-tuned chat models, 2023.
  18. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
  19. Feature learning in infinite-width neural networks, 2022.
  20. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models, 2022.
  21. Adaptive budget allocation for parameter-efficient fine-tuning, 2023.
  22. Multilingual machine translation with large language models: Empirical results and analysis, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
Citations (49)