LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning (2308.03303v1)

Published 7 Aug 2023 in cs.CL

Abstract: The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning LLMs, however, it still requires expensive activation memory to update low-rank weights. Reducing the number of LoRA layers or using activation recomputation could harm the fine-tuning performance or increase the computational overhead. In this work, we present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. LoRA-FA chooses to freeze the projection-down weight of $A$ and update the projection-up weight of $B$ in each LoRA layer. It ensures the change of model weight reside in a low-rank space during LLMs fine-tuning, while eliminating the requirement to store full-rank input activations. We conduct extensive experiments across multiple model types (RoBERTa, T5, LLaMA) and model scales. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA. Furthermore, LoRA-FA can reduce the overall memory cost by up to 1.4$\times$ compared to LoRA.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (5)

Longteng Zhang (4 papers)
Lin Zhang (342 papers)
Shaohuai Shi (47 papers)
Xiaowen Chu (108 papers)
Bo Li (1107 papers)

Citations (71)

View on Semantic Scholar

Tweets

https://twitter.com/y_m_asano/status/1748033262897844267

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning (2308.03303v1)

Related Papers

Tweets