Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 97 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 100 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Kimi K2 186 tok/s Pro
2000 character limit reached

Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs (2410.19694v1)

Published 25 Oct 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Fine-tuning LLMs has become a crucial technique for adapting pre-trained models to downstream tasks. However, the enormous size of LLMs poses significant challenges in terms of computational complexity and resource requirements. Low-Rank Adaptation (LoRA) has emerged as a promising solution. However, there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum. In this work, we propose eXtreme Gradient Boosting LoRA (XGBLoRA), a novel framework that bridges this gap by leveraging the power of ensemble learning. Inspired by gradient boosting, XGBLoRA iteratively learns and merges a sequence of LoRA adaptations to refine model predictions. It achieves better performance than the standard LoRA, while enjoying the computational efficiency of rank-1 adaptations. We provide theoretical analysis to show the convergence and optimality of our approach, and conduct extensive experiments on a range of natural language processing tasks. The results demonstrate that XGBLoRA consistently outperforms standard LoRA and achieves performance comparable to full fine-tuning with significantly fewer trainable parameters. This work advances parameter-efficient fine-tuning for LLMs, and offers a promising solution for adapting LLMs to downstream tasks while optimizing performance and efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. The falcon series of language models: Towards open frontier models. 2023.
  2. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023.
  3. Language Models are Few-Shot Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
  4. Language models are few-shot learners. ArXiv, abs/2005.14165, 2020.
  5. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
  8. Taming pre-trained language models with n-gram representations for low-resource domain adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3336–3349, 2021.
  9. Mixture-of-domain-adapters: Decoupling and injecting domain knowledge to pre-trained language models memories. arXiv preprint arXiv:2306.05406, 2023.
  10. Trilora: Integrating svd for advanced style personalization in text-to-image generation. arXiv preprint arXiv:2405.11236, 2024.
  11. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  12. Jerome H Friedman. Stochastic gradient boosting. Computational statistics & data analysis, 38(4):367–378, 2002.
  13. Flora: Low-rank adapters are secretly gradient compressors. arXiv preprint arXiv:2402.03293, 2024.
  14. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366, 2021.
  15. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  16. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  17. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  18. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  19. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933, 2023.
  20. Meta inc. The official meta llama 3 github site. https://github.com/meta-llama/llama3, 2024.
  21. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  22. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
  23. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  24. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, page 2, 2019.
  25. Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1):1–50, 2002.
  26. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  27. Relora: High-rank training through low-rank updates, 2023.
  28. Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353, 2024.
  29. Yinhan Liu. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  30. Graph component contrastive learning for concept relatedness estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 13362–13370, 2023.
  31. R OpenAI. Gpt-4 technical report. ArXiv, 2303, 2023.
  32. Adapterfusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247, 2020.
  33. Improving language understanding by generative pre-training. 2018.
  34. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1), jan 2020.
  36. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  37. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, 2016.
  38. Melora: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3052–3064, 2024.
  39. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  40. See further for parameter efficient fine-tuning by standing on the shoulders of decomposition. arXiv preprint arXiv:2407.05417, 2024.
  41. No change, no gain: empowering graph neural networks with expected model change maximization for active learning. Advances in Neural Information Processing Systems, 36, 2024.
  42. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.
  43. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  44. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  45. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  46. DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3274–3287, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics.
  47. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, 2018.
  48. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, 2018.
  49. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
  50. The expressive power of low-rank adaptation. In The Twelfth International Conference on Learning Representations, 2024.
  51. Adalora: Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512, 2023.
  52. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  53. Revisiting few-sample bert fine-tuning. arXiv preprint arXiv:2006.05987, 2020.
  54. Boosting with early stopping: Convergence and consistency. The Annals of Statistics, 33(4):1538, 2005.
  55. Contrastive cross-scale graph knowledge synergy. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3422–3433, 2023.
  56. Costa: covariance-preserving feature augmentation for graph contrastive learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2524–2534, 2022.
  57. Spectral feature augmentation for graph contrastive learning and beyond. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11289–11297, 2023.
  58. Mitigating the popularity bias of graph collaborative filtering: A dimensional collapse perspective. Advances in Neural Information Processing Systems, 36:67533–67550, 2023.
  59. Llamafactory: Unified efficient fine-tuning of 100+ language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand, 2024. Association for Computational Linguistics.
  60. Delta-lora: Fine-tuning high-rank parameters with the delta of low-rank matrices. arXiv preprint arXiv:2309.02411, 2023.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.