GeoLoRA: Geometric integration for parameter efficient fine-tuning (2410.18720v1)
Abstract: Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of large-scale, pre-trained neural networks. However, LoRA and its extensions face several challenges, including the need for rank adaptivity, robustness, and computational efficiency during the fine-tuning process. We introduce GeoLoRA, a novel approach that addresses these limitations by leveraging dynamical low-rank approximation theory. GeoLoRA requires only a single backpropagation pass over the small-rank adapters, significantly reducing computational cost as compared to similar dynamical low-rank training methods and making it faster than popular baselines such as AdaLoRA. This allows GeoLoRA to efficiently adapt the allocated parameter budget across the model, achieving smaller low-rank adapters compared to heuristic methods like AdaLoRA and LoRA, while maintaining critical convergence, descent, and error-bound theoretical guarantees. The resulting method is not only more efficient but also more robust to varying hyperparameter settings. We demonstrate the effectiveness of GeoLoRA on several state-of-the-art benchmarks, showing that it outperforms existing methods in both accuracy and computational efficiency.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning, 2020.
- A rank-adaptive robust integrator for dynamical low-rank approximation. BIT Numerical Mathematics, 2022. URL https://doi.org/10.1007/s10543-021-00907-7.
- A parallel rank-adaptive integrator for dynamical low-rank approximation, 2023. URL https://arxiv.org/abs/2304.05660.
- Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv:1602.02830, 2016.
- Approximately optimal core shapes for tensor decompositions. In International Conference on Machine Learning, pp. 11237–11254. PMLR, 2023.
- Dynamic network surgery for efficient dnns. Advances in neural information processing systems, 29, 2016.
- Lora+: Efficient low rank adaptation of large models, 2024.
- Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, 2023. URL https://arxiv.org/abs/2111.09543.
- Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pp. 1389–1397, 2017.
- Stochastic aspects of dynamical low-rank approximation in the context of machine learning. Optimization Online, 2024. doi: https://optimization-online.org/?p=25971. URL https://optimization-online.org/?p=25971.
- Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2790–2799. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/houlsby19a.html.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Initialization and regularization of factorized neural layers. In International Conference on Learning Representations, 2021.
- O. Koch and C. Lubich. Dynamical low-rank approximation. SIAM Journal on Matrix Analysis and Applications, 29(2):434–454, 2007a. ISSN 0895-4798. doi: 10.1137/050639703. URL https://doi.org/10.1137/050639703.
- Dynamical low-rank approximation. SIAM Journal on Matrix Analysis and Applications, 29(2):434–454, 2007b.
- Relora: High-rank training through low-rank updates, 2023.
- Dora: Enhancing parameter-efficient fine-tuning with dynamic rank distribution. 2024. URL https://api.semanticscholar.org/CorpusID:270062642.
- Pruning convolutional neural networks for resource efficient inference. In International Conference on Learning Representations, 2017.
- Adapterfusion: Non-destructive task composition for transfer learning, 2021. URL https://arxiv.org/abs/2005.00247.
- High-resolution image synthesis with latent diffusion models, 2021.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, 2023. URL https://arxiv.org/abs/2208.12242.
- Hiroyuki Sato. Riemannian optimization and its applications, volume 670. Springer, 2021.
- Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations. In Advances in Neural Information Processing Systems, 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/7e98b00eeafcdaeb0c5661fb9355be3a-Paper-Conference.pdf.
- Federated dynamical low-rank training with global loss convergence guarantees, 2024. URL https://arxiv.org/abs/2406.17887.
- Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation, 2023.
- Glue: A multi-task benchmark and analysis platform for natural language understanding, 2019. URL https://arxiv.org/abs/1804.07461.
- Pufferfish: Communication-efficient models at no extra cost. Proceedings of Machine Learning and Systems, 3:365–386, 2021.
- Solving ordinary differential equations II, volume 375. Springer Berlin Heidelberg New York, 1996.
- Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4820–4828, 2016.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models, 2022. URL https://arxiv.org/abs/2106.10199.
- Rank-adaptive spectral pruning of convolutional layers during training. In Advances in Neural Information Processing Systems, 2024.
- Adalora: Adaptive budget allocation for parameter-efficient fine-tuning, 2023.
- Galore: Memory-efficient llm training by gradient low-rank projection, 2024.