Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
36 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA (2402.16902v2)

Published 24 Feb 2024 in cs.LG

Abstract: With the rapid scaling of LLMs, serving numerous low-rank adaptations (LoRAs) concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components: broadcast reduction, rotation enhancement, partially-sharing refinement, and rectified initialization strategy. As a superset of LoRA, PRoLoRA retains its advantages, and effectively circumvent the drawbacks of peer parameter-sharing methods with superior model capacity, practical feasibility, and broad applicability. Empirical experiments demonstrate the remarkably higher parameter efficiency of PRoLoRA in both specific parameter budget and performance target scenarios, and its scalability to larger LLMs. Notably, with one time less trainable parameters, PRoLoRA still outperforms LoRA on multiple instruction tuning datasets. Subsequently, an ablation study is conducted to validate the necessity of individual components and highlight the superiority of PRoLoRA over three potential variants. Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
  2. Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca.
  3. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  4. Llava-mole: Sparse mixture of lora experts for mitigating data conflicts in instruction finetuning mllms. arXiv preprint arXiv:2401.16160.
  5. Netgpt: A native-ai network architecture beyond provisioning personalized generative services. arXiv preprint arXiv:2307.06148.
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  7. Tydi qa: A benchmark for information-seeking question answering in ty pologically di verse languages. Transactions of the Association for Computational Linguistics, 8:454–470.
  8. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  9. Universal transformers. arXiv preprint arXiv:1807.03819.
  10. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  11. Edgeformer: A parameter-efficient transformer for on-device seq2seq generation. arXiv preprint arXiv:2202.07959.
  12. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034.
  13. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR).
  14. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  16. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269.
  17. Vera: Vector-based random matrix adaptation. arXiv preprint arXiv:2310.11454.
  18. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles.
  19. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68.
  20. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
  21. Dictformer: Tiny transformer with shared dictionary. In International Conference on Learning Representations.
  22. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
  23. OpenAI. 2023. Gpt-4 technical report.
  24. One wide feedforward is all you need. arXiv preprint arXiv:2309.01826.
  25. Subformer: Exploring weight sharing for parameter efficiency in generative transformers. arXiv preprint arXiv:2101.00234.
  26. Tied-lora: Enhacing parameter efficiency of lora with weight tying. arXiv preprint arXiv:2311.09578.
  27. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  28. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
  29. Sho Takase and Shun Kiyono. 2023. Lessons on parameter sharing across layers in transformers. In Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), pages 78–90, Toronto, Canada (Hybrid). Association for Computational Linguistics.
  30. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  32. How far can camels go? exploring the state of instruction tuning on open resources. arXiv preprint arXiv:2306.04751.
  33. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. arXiv preprint arXiv:2204.07705.
  34. Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
  35. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  36. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512.
  37. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets