Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution (2405.17357v3)

Published 27 May 2024 in cs.CL

Abstract: Fine-tuning large-scale pre-trained models is inherently a resource-intensive task. While it can enhance the capabilities of the model, it also incurs substantial computational costs, posing challenges to the practical application of downstream tasks. Existing parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) rely on a bypass framework that ignores the differential parameter budget requirements across weight matrices, which may lead to suboptimal fine-tuning outcomes. To address this issue, we introduce the Dynamic Low-Rank Adaptation (DoRA) method. DoRA decomposes high-rank LoRA layers into structured single-rank components, allowing for dynamic pruning of parameter budget based on their importance to specific tasks during training, which makes the most of the limited parameter budget. Experimental results demonstrate that DoRA can achieve competitive performance compared with LoRA and full model fine-tuning, and outperform various strong baselines with the same storage parameter budget. Our code is available at https://github.com/MIkumikumi0116/DoRA

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Adapting pre-trained language models to African languages via multilingual adaptive fine-tuning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4336–4349, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  2. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, Dublin, Ireland. Association for Computational Linguistics.
  3. The fifth pascal recognizing textual entailment challenge. TAC, 7:8.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. Sparse low-rank adaptation of pre-trained language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4133–4145.
  6. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904.
  7. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235.
  8. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
  9. Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650.
  10. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online. Association for Computational Linguistics.
  11. Parameter-efficient transfer learning with diff pruning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4884–4896, Online. Association for Computational Linguistics.
  12. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366.
  13. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
  14. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543.
  15. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
  16. Nxmtransformer: Semi-structured sparsification for natural language understanding via admm. Advances in neural information processing systems, 34:1818–1830.
  17. Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR.
  18. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  19. Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2225–2240, Dublin, Ireland. Association for Computational Linguistics.
  20. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035.
  21. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2.
  22. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  23. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  24. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  25. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. CoRR, abs/1910.13461.
  26. Xiang Lisa Li and Percy Liang. 2021a. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
  27. Xiang Lisa Li and Percy Liang. 2021b. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  28. Relora: High-rank training through low-rank updates. In Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ NeurIPS 2023).
  29. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, Dublin, Ireland. Association for Computational Linguistics.
  30. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  31. Mourad Mars. 2022. From word embeddings to pre-trained language models: A state-of-the-art walkthrough. Applied Sciences, 12(17):8805.
  32. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745.
  33. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  34. AdapterFusion: Non-destructive task composition for transfer learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 487–503, Online. Association for Computational Linguistics.
  35. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
  36. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  37. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  38. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  39. Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30.
  40. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  41. MSP: Multi-stage prompting for making pre-trained language models better translators. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6131–6142, Dublin, Ireland. Association for Computational Linguistics.
  42. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  43. Is fine-tuning needed? pre-trained language models are near perfect for out-of-domain detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12813–12832, Toronto, Canada. Association for Computational Linguistics.
  44. Dylora: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3274–3287.
  45. Attention is all you need. Advances in neural information processing systems, 30.
  46. Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904.
  47. Efficient fine-tuning of bert models on the edge. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1838–1842. IEEE.
  48. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
  49. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641.
  50. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
  51. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  52. One network, many masks: Towards more parameter-efficient transfer learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7564–7580, Toronto, Canada. Association for Computational Linguistics.
  53. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512.
  54. A survey of large language models. arXiv preprint arXiv:2303.18223.
  55. Counter-interference adapter for multilingual machine translation. arXiv preprint arXiv:2104.08154.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yulong Mao (2 papers)
  2. Kaiyu Huang (16 papers)
  3. Changhao Guan (3 papers)
  4. Ganglin Bao (1 paper)
  5. Fengran Mo (35 papers)
  6. Jinan Xu (64 papers)
Citations (7)
X Twitter Logo Streamline Icon: https://streamlinehq.com