Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying (2311.09578v2)

Published 16 Nov 2023 in cs.CL, cs.AI, and cs.LG

Abstract: We introduce Tied-LoRA, a novel paradigm leveraging weight tying and selective training to enhance the parameter efficiency of Low-rank Adaptation (LoRA). Our exploration encompasses different plausible combinations of parameter training and freezing, coupled with weight tying, aimed at identifying the optimal trade-off between performance and the count of trainable parameters. Across $5$ diverse tasks and two foundational LLMs with different parameter counts, our experiments provide comprehensive insights into the inherent trade-offs between efficiency and performance. Our findings reveal a specific Tied-LoRA configuration that distinguishes itself by showcasing comparable performance to LoRA across multiple tasks while utilizing only a fraction of the parameters employed by the standard LoRA method, particularly at elevated ranks. This underscores the efficacy of Tied-LoRA in achieving impressive results with significantly reduced model complexity.

Citations (29)

View on Semantic Scholar

Summary

The paper introduces Tied-LoRA, which integrates weight tying with selective training to reduce parameters while preserving NLP task performance.
The methodology employs low-rank projection and weight sharing across layers to decrease computational demand and improve resource utilization.
Experimental results demonstrate up to an 87.5% reduction in parameters with comparable or improved outcomes on tasks like translation and summarization.

Tied-LoRA: Enhancing Parameter Efficiency of LoRA with Weight Tying

Introduction

The paper "Tied-LoRA: Enhancing Parameter Efficiency of LoRA with Weight Tying" (2311.09578) introduces Tied-LoRA, a refined approach that synergizes weight tying and selective training to improve the parameter efficiency of Low-rank Adaptation (LoRA). This methodology seeks to balance performance with the number of trainable parameters across diverse NLP tasks and foundational LLMs, thereby minimizing computational demands while maintaining or enhancing task performance.

Methodology

Tied-LoRA is conceptualized to enhance LoRA by integrating weight tying across model layers, combined with strategic freezing and training of parameters. This configuration reduces the computational complexity associated with fine-tuning LLMs, making them more adaptable to various tasks without compromising performance. By tying low-rank matrices throughout, Tied-LoRA significantly reduces the trainable parameter count compared to standard LoRA.

Formulation: The Tied-LoRA framework generalizes existing low-rank update mechanisms. Specifically, the method involves weight tying for low-rank matrices, allowing these weights to be shared across all layers of the model. This is implemented through the following formulation for a linear layer output with a frozen pretrained weight matrix $W$ :

$z = Wx + \Delta Wx \approx Wx + A_v B_u A_u x$

where $A$ and $B$ are low-rank projection matrices, and $u$ and $v$ are diagonal scaling matrices.

Selective Training: Tied-LoRA offers flexibility in training configurations, allowing certain parameters to be frozen or adjusted based on task requirements.

Experimental Setup

The paper employs two base models, NVIDIA's GPT-2B-001 and Meta's LLaMA 2 7B, to validate the effectiveness of Tied-LoRA across multiple NLP tasks such as extractive question answering, summarization, natural language inference, translation, and mathematical reasoning. These tasks represent real-world customization scenarios, ensuring the results are widely applicable.

Results

The experimental results highlight that Tied-LoRA retains performance levels comparable to standard LoRA while drastically reducing parameter usage. Notably:

Translation Task: The TL6(vB_u A_g) configuration not only matched but outperformed LoRA using merely 12.5% of the parameters.
Efficiency: Tied-LoRA configurations like TL5 and TL6 consistently demonstrated substantial parameter efficiency without significant performance drops, positioning them as cost-effective alternatives, especially in large model settings.

Task-Dependent Optimal Rank

Through precise experiments, it was observed that optimal rank settings vary per task, defying the presumption that higher ranks guarantee better performance. This variability underscores the adaptability of Tied-LoRA in achieving optimal resource utilization tailored to specific tasks.

Conclusion

Tied-LoRA emerges as a potent method for parameter-efficient fine-tuning of LLMs, enabling significant reductions in trainable parameters while maintaining competitive task performance. This makes Tied-LoRA a valuable tool for applications requiring extensive model customization across diverse NLP tasks. Future research will extend the application of Tied-LoRA to larger models and explore its integration with other parameter-efficient methods such as Adapters and Prefix Tuning, further enhancing its scalability and utility in the field of AI.