Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning (2402.10462v1)

Published 16 Feb 2024 in cs.LG and cs.CL

Abstract: Finetuning LLMs requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requiring further fine-tuning steps. This paper proposes QDyLoRA -Quantized Dynamic Low-Rank Adaptation-, as an efficient quantization approach for dynamic low-rank adaptation. Motivated by Dynamic LoRA, QDyLoRA is able to efficiently finetune LLMs on a set of pre-defined LoRA ranks. QDyLoRA enables fine-tuning Falcon-40b for ranks 1 to 64 on a single 32 GB V100-GPU through one round of fine-tuning. Experimental results show that QDyLoRA is competitive to QLoRA and outperforms when employing its optimal rank.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
  2. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  3. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  4. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  5. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235.
  6. Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650.
  7. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366.
  8. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  9. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  10. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  11. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
  12. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
  13. Alphatuning: Quantization-aware parameter-efficient adaptation of large-scale pre-trained language models. arXiv preprint arXiv:2210.03858.
  14. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965.
  15. Webglm: Towards an efficient web-enhanced question answering system with human preferences. arXiv preprint arXiv:2306.07906.
  16. Unipelt: A unified framework for parameter-efficient language model tuning. arXiv preprint arXiv:2110.07577.
  17. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems, 35:12991–13005.
  18. Stanford alpaca: An instruction-following llama model.
  19. Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558.
  20. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Hossein Rajabzadeh (8 papers)
  2. Mojtaba Valipour (8 papers)
  3. Tianshu Zhu (3 papers)
  4. Marzieh Tahaei (8 papers)
  5. Hyock Ju Kwon (5 papers)
  6. Ali Ghodsi (73 papers)
  7. Boxing Chen (67 papers)
  8. Mehdi Rezagholizadeh (78 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets