Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning (2308.03303v1)

Published 7 Aug 2023 in cs.CL

Abstract: The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning LLMs, however, it still requires expensive activation memory to update low-rank weights. Reducing the number of LoRA layers or using activation recomputation could harm the fine-tuning performance or increase the computational overhead. In this work, we present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. LoRA-FA chooses to freeze the projection-down weight of $A$ and update the projection-up weight of $B$ in each LoRA layer. It ensures the change of model weight reside in a low-rank space during LLMs fine-tuning, while eliminating the requirement to store full-rank input activations. We conduct extensive experiments across multiple model types (RoBERTa, T5, LLaMA) and model scales. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA. Furthermore, LoRA-FA can reduce the overall memory cost by up to 1.4$\times$ compared to LoRA.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Longteng Zhang (4 papers)
  2. Lin Zhang (342 papers)
  3. Shaohuai Shi (47 papers)
  4. Xiaowen Chu (108 papers)
  5. Bo Li (1107 papers)
Citations (71)
X Twitter Logo Streamline Icon: https://streamlinehq.com