Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters (2405.17604v2)

Published 27 May 2024 in cs.LG, cs.AI, and cs.CL
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Abstract: The rapid expansion of LLMs has underscored the need for parameter-efficient fine-tuning methods, with LoRA (Low-Rank Adaptation) emerging as a popular solution. Although LoRA reduces the number of trainable parameters, serving multiple (task or user-specific) LoRA modules on top of a base model still creates significant storage challenges. To address this, using theoretical derivation, we introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel low-rank adaptation method that considerably reduces the trainable parameters while showing superior or competitive performance. LoRA-XS achieves this by inserting a small, trainable r x r weight matrix between frozen low-rank matrices, which are constructed by Singular Value Decomposition (SVD) of the original weight matrix. This lightweight matrix enables fine-tuning with drastically reduced storage requirements, making it feasible to deploy millions of personalized models while minimizing memory overhead. For instance, LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA. Our evaluations across various benchmarks (including GLUE, GSM8K, MATH, and eight commonsense reasoning datasets) demonstrate that LoRA-XS performs competitively or better than LoRA and other recent methods like VeRA while being significantly more parameter efficient. We also provide an extensive ablation study on the importance of singular vectors in transformer weights, shedding light on the underlying mechanisms driving LoRA-XS's enhanced efficiency. These findings suggest that LoRA-XS is not only a storage-efficient alternative, but also a powerful tool for scaling and personalizing LLMs at unprecedented scales.

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

The paper "LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters" addresses a critical concern in the field of NLP and, specifically, in the adaptation of LLMs. The proposed method, LoRA-XS, enhances parameter-efficient tuning approaches by significantly reducing the number of trainable parameters while maintaining competitive performance across various NLP benchmarks.

In recent years, the advent and scaling of LLMs have driven remarkable advancements in NLP. However, the sheer size of these models introduces extensive challenges, particularly when it comes to fine-tuning them for specific downstream tasks. Traditional fine-tuning methods necessitate updating a vast number of parameters, leading to substantial computational and storage demands. Parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) emerged as viable solutions, significantly reducing the number of trainable parameters. Despite their success, even methods like LoRA face storage challenges, especially when handling large-scale personalized or task-specific models.

Methodology

LoRA-XS leverages Singular Value Decomposition (SVD) to address these limitations. The core idea involves constructing a small r×rr \times r trainable weight matrix placed between frozen LoRA matrices, derived from the original pretrained weight matrix. This setup ensures that the number of trainable parameters is decoupled from the model dimensions, thus providing a substantial reduction in trainable parameters, by over 100x in specific cases. The LoRA-XS method comprises the following key steps:

  1. SVD-Based Initialization: The pretrained weight matrix WW undergoes truncated SVD, decomposing it into matrices UU, ΣΣ, and VV.
  2. Frozen Adaptation Matrices: The decomposition results in low-rank adaptation matrices AA and BB, which are fixed during training.
  3. Introduction of Trainable Matrix: A small r×rr \times r trainable matrix RR is placed between AA and BB, acting as the only trainable component, thereby significantly reducing the parameter count.

This novel approach not only improves parameter efficiency but also enhances flexibility, as the number of trainable parameters can be precisely controlled based on the downstream task requirements.

Experiments and Results

GLUE Benchmark

The paper evaluates LoRA-XS on the GLUE benchmark using the RoBERTa-large model and compares it against full fine-tuning (FT), LoRA, and VeRA. The results reveal that LoRA-XS with ranks ranging from 4 to 25 outperforms both LoRA and VeRA in parameter efficiency while maintaining high performance. For instance, LoRA-XS with a rank of 16 achieved superior accuracy over VeRA and LoRA, with a 2.5x and 30x reduction in trainable parameters, respectively.

Instruction Tuning

The method's efficacy in instruction tuning is also validated on the Mistral-7B and Gemma-7B models, focusing on the MetaMathQA dataset and evaluating on the GSM8K and MATH benchmarks. LoRA-XS demonstrated competitive performance to both full fine-tuning and LoRA, achieving this with a drastic reduction in trainable parameters. For example, LoRA-XS with only 0.92M parameters performed comparably to LoRA with 168M parameters on both benchmarks.

Ablation Study

An important aspect of the paper is the ablation paper comparing SVD-based initialization versus random initialization. The results underscore the benefits of aligning adaptation matrices with the top principal components of the pretrained weights. The SVD-based initialization not only led to better final performance but also accelerated convergence, particularly with smaller ranks.

Implications and Future Directions

LoRA-XS's implications are substantial for both theoretical and practical aspects of AI and NLP. The method addresses the scalability issue of PEFT methods by making trainable parameters independent of model dimensions, particularly beneficial for large-scale models. From a practical standpoint, this reduces the computational and storage overhead, paving the way for more efficient deployments of LLMs in real-world applications.

The theoretical implication lies in the innovative use of SVD for initializing adaptation matrices, suggesting a shift towards more informed initialization strategies in deep learning. Future research can explore extending LoRA-XS to other architectures and tasks, including reinforcement learning and multimodal models. Also, further studies could investigate the joint application of LoRA-XS with other memory-saving techniques like model quantization.

Conclusion

In conclusion, LoRA-XS presents a significant advancement in parameter-efficient fine-tuning. By leveraging SVD and introducing a minimal number of trainable parameters, it addresses the critical scalability and storage challenges faced by LLMs. The empirical results across various benchmarks and models highlight its efficiency and potential, marking a meaningful contribution to the field of NLP and model optimization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Klaudia Bałazy (8 papers)
  2. Mohammadreza Banaei (8 papers)
  3. Karl Aberer (44 papers)
  4. Jacek Tabor (106 papers)
Citations (9)