PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization (2402.16141v1)
Abstract: Supervised fine-tuning is the most common method to adapt LLMs to downstream tasks, but full fine-tuning LLMs requires massive computational resources. Recently, parameter-efficient fine-tuning (PEFT) methods have been widely studied due to its cost-effectiveness. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low-dimensional. Although LoRA fine-tuning is effective, there is still a performance gap compared to full fine-tuning, since its weight update is limited to low-rank matrices. In order to break the low-rank bottleneck in LoRA Optimization, we propose PeriodicLoRA (PLoRA), which accumulates low-rank update matrices multiple times to achieve a higher update rank. PLoRA has multiple training stages. During each stage, we still update only the LoRA weights. However, at the end of each stage, we unload the LoRA weights into the backbone parameters and then reinitialize the LoRA states. Experimental results show that PLoRA has stronger learning ability, approximately 1.8 times that of LoRA's learning ability at most, but it does not increase memory usage. Further, we introduce a momentum-based unloading strategy for PLoRA to mitigate the training instability.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
- Chatgpt’s one-year anniversary: Are open-source large language models catching up?
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
- Think you have solved question answering? try arc, the ai2 reasoning challenge.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- Sparse low-rank adaptation of pre-trained language models. arXiv preprint arXiv:2311.11696.
- A framework for few-shot language model evaluation.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
- Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
- Parameter-efficient transfer learning for nlp. International Conference on Machine Learning,International Conference on Machine Learning.
- Lorahub: Efficient cross-task generalization via dynamic lora composition.
- Lora: Low-rank adaptation of large language models. arXiv: Computation and Language,arXiv: Computation and Language.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation.
- Ilya Loshchilov and Frank Hutter. 2017. Fixing weight decay regularization in adam. CoRR, abs/1711.05101.
- Orca: Progressive learning from complex explanation traces of gpt-4.
- Gpt-4 technical report.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
- Multilora: Democratizing lora for better multi-task learning. arXiv preprint arXiv:2311.11501.
- How far can camels go? exploring the state of instruction tuning on open resources.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. arXiv preprint arXiv:2204.07705.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
- Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512.
- Judging llm-as-a-judge with mt-bench and chatbot arena.
- Xiangdi Meng (6 papers)
- Damai Dai (38 papers)
- Weiyao Luo (4 papers)
- Zhe Yang (60 papers)
- Shaoxiang Wu (3 papers)
- Xiaochen Wang (32 papers)
- Peiyi Wang (48 papers)
- Qingxiu Dong (39 papers)
- Liang Chen (360 papers)
- Zhifang Sui (89 papers)