Papers
Topics
Authors
Recent
Search
2000 character limit reached

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

Published 24 Feb 2024 in cs.LG | (2402.16902v2)

Abstract: With the rapid scaling of LLMs, serving numerous low-rank adaptations (LoRAs) concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components: broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization strategy. As a superset of LoRA, PRoLoRA retains its advantages, and effectively circumvent the drawbacks of peer parameter-sharing methods with superior model capacity, practical feasibility, and broad applicability. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA in both specific parameter budget and performance target scenarios, and its scalability to larger LLMs. Notably, with one time less trainable parameters, PRoLoRA still outperforms LoRA on multiple instruction tuning datasets. Subsequently, an ablation study is conducted to validate the necessity of individual components and highlight the superiority of PRoLoRA over three potential variants. Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
  2. Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca.
  3. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  4. Llava-mole: Sparse mixture of lora experts for mitigating data conflicts in instruction finetuning mllms. arXiv preprint arXiv:2401.16160.
  5. Netgpt: A native-ai network architecture beyond provisioning personalized generative services. arXiv preprint arXiv:2307.06148.
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  7. Tydi qa: A benchmark for information-seeking question answering in ty pologically di verse languages. Transactions of the Association for Computational Linguistics, 8:454–470.
  8. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  9. Universal transformers. arXiv preprint arXiv:1807.03819.
  10. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  11. Edgeformer: A parameter-efficient transformer for on-device seq2seq generation. arXiv preprint arXiv:2202.07959.
  12. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034.
  13. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR).
  14. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  16. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269.
  17. Vera: Vector-based random matrix adaptation. arXiv preprint arXiv:2310.11454.
  18. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles.
  19. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68.
  20. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
  21. Dictformer: Tiny transformer with shared dictionary. In International Conference on Learning Representations.
  22. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
  23. OpenAI. 2023. Gpt-4 technical report.
  24. One wide feedforward is all you need. arXiv preprint arXiv:2309.01826.
  25. Subformer: Exploring weight sharing for parameter efficiency in generative transformers. arXiv preprint arXiv:2101.00234.
  26. Tied-lora: Enhacing parameter efficiency of lora with weight tying. arXiv preprint arXiv:2311.09578.
  27. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  28. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
  29. Sho Takase and Shun Kiyono. 2023. Lessons on parameter sharing across layers in transformers. In Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), pages 78–90, Toronto, Canada (Hybrid). Association for Computational Linguistics.
  30. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  32. How far can camels go? exploring the state of instruction tuning on open resources. arXiv preprint arXiv:2306.04751.
  33. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. arXiv preprint arXiv:2204.07705.
  34. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
  35. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  36. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512.
  37. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792.
Citations (5)

Summary

  • The paper introduces PRoLoRA, demonstrating double the parameter efficiency over traditional LoRA methods while maintaining superior performance.
  • The method employs innovative techniques such as broadcast reduction, rotation enhancement, and partially-sharing refinement to optimize low-rank adaptations.
  • Empirical results on LLaMA2 models validate PRoLoRA's scalability and resource efficiency, making it ideal for personalized and multitask model customization.

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

Introduction

The paper introduces Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an advanced method for parameter-efficient finetuning of LLMs like LLaMA2 and GPT-3.5 Turbo. Finetuning LLMs across multiple domains incurs substantial resource costs, especially when employing multiple LoRA instances for personalized or multitasking purposes. The proposed PRoLoRA method addresses these challenges by incorporating intra-layer sharing mechanisms that include broadcast reduction, rotation enhancement, partially-sharing refinement, and a rectified initialization strategy.

PRoLoRA builds upon the Low-Rank Adaptation (LoRA) approach in several ways, enhancing both capacity and practicality while maintaining broad applicability. The empirical results demonstrate superior performance with significantly fewer trainable parameters, making PRoLoRA an ideal candidate for resource-efficient model customization.

Method

PRoLoRA enhances parameter efficiency through careful reparameterization of low-rank matrices and leveraging intra-layer sharing mechanisms. The main components are detailed as follows:

Broadcast Reduction

This process involves partitioning low-rank matrices into smaller chunks and broadcasting the initial chunk across multiple segments. As illustrated in (Figure 1), this approach effectively reduces the number of trainable parameters by sharing them across the expanded matrices, resulting in higher parameter efficiency. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: Illustration of the original LoRA, our proposed PRoLoRA, and their intermediate states (i.e., CLoRA and RoLoRA). Here we set the rank rr, unshared rank uu, sharing rates mm and nn of the A\mathbf{A} and B\mathbf{B} matrices to be 4, 1, 2, and 3, respectively.

Rotation Enhancement

To address issues related to reduced expressiveness due to simple replication, PRoLoRA introduces rotation enhancement by applying a cost-free rotation to differentiate broadcast chunks. This increases the representational capacity without adding extra parameters and refines the weight difference matrix.

Partially-Sharing Refinement

Incorporating unshared parameters into the low-rank matrices allows for more refined matrix expressiveness and improved parameter efficiency. This mechanism differentiates PRoLoRA from CLoRA and RoLoRA methods by providing enhanced adaptability and reduced block-wise symmetry in matrices.

Rectified Initialization Strategy

PRoLoRA adopts a rectified Kaiming uniform initialization for shared chunks, ensuring unified bounds and facilitating effective optimization during training. This careful initialization aids in maintaining superior performance when deploying the proposed sharing mechanisms.

Results and Analysis

Experiments using the LLaMA2-7B and 13B models demonstrate PRoLoRA's scalable efficiency. The model consistently outperformed LoRA-constrained parameter budgets, achieving double the parameter efficiency in practical settings. With identical budget constraints, PRoLoRA excelled in average performance metrics when compared to LoRA, VeRA, and Tied LoRA baselines.

Unshared Rank and Learning Rate

PRoLoRA performance analysis under varied unshared ranks and learning rates showed consistent superiority over LoRA and other alternatives as depicted in (Figure 2). Figure 2

Figure 2: Performance of PRoLoRA with the rank of 32 with respect to unshared ranks and learning rates given a specific parameter budget on the LLaMA2-7B model and BBH benchmark.

Scalability to Larger Models

PRoLoRA maintains efficient performance with larger models such as LLaMA2-13B, highlighting its adaptability for scaling without compromising resource efficiency or performance.

Conclusion

PRoLoRA offers a comprehensive framework for parameter-efficient adaptation of LLMs through innovative intra-layer sharing mechanisms. Its application significantly alleviates memory burdens and enhances scalability while providing improved performance. PRoLoRA stands as a resource-friendly alternative to existing methods and possesses potential for further integration with other sharing frameworks. The paper suggests promising directions for future developments in efficient model customization, particularly in personalized and multitask scenarios.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.