Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation (2401.11316v1)

Published 20 Jan 2024 in cs.CL and cs.AI

Abstract: With the proliferation of large pre-trained LLMs (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Layer normalization. arXiv preprint arXiv:1607.06450.
  2. Parameter-efficient masking networks. Advances in Neural Information Processing Systems, 35:10217–10229.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  4. Parameter-efficient fine-tuning design spaces. arXiv preprint arXiv:2301.01821.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  6. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  8. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv preprint arXiv:2002.06305.
  9. Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pages 647–655. PMLR.
  10. Jonathan Frankle and Michael Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  11. The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574.
  12. Cross-attention is all you need: Adapting pretrained transformers for machine translation. arXiv preprint arXiv:2104.08771.
  13. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463.
  14. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
  15. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 1135–1143.
  16. Optimal brain surgeon and general network pruning. In IEEE international conference on neural networks, pages 293–299. IEEE.
  17. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations.
  18. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543.
  19. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
  20. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  21. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  22. Neural architecture search for parameter-efficient fine-tuning of large pre-trained language models. arXiv preprint arXiv:2305.16597.
  23. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  24. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 4582–4597. Association for Computational Linguistics.
  25. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  26. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270.
  27. Estimating the carbon footprint of bloom, a 176b parameter language model. arXiv preprint arXiv:2211.02001.
  28. Unipelt: A unified framework for parameter-efficient language model tuning. arXiv preprint arXiv:2110.07577.
  29. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.
  30. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 8024–8035.
  31. Adapterfusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247.
  32. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
  33. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  34. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
  35. Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30.
  36. Maeday: Mae for few and zero shot anomaly-detection. arXiv preprint arXiv:2211.14307.
  37. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695.
  38. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems, 35:12991–13005.
  39. Training neural networks with fixed sparse masks. Advances in Neural Information Processing Systems, 34:24193–24205.
  40. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  41. Attention is all you need. Advances in neural information processing systems, 30.
  42. Efficient fine-tuning of bert models on the edge. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1838–1842. IEEE.
  43. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  44. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv preprint, abs/1910.03771.
  45. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199.
  46. Adaptive budget allocation for parameter-efficient fine-tuning. In The Eleventh International Conference on Learning Representations.
  47. Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings. OpenReview.net.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Nadav Benedek (2 papers)
  2. Lior Wolf (217 papers)
Citations (1)