LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits (2502.08141v1)

Published 12 Feb 2025 in cs.LG, cs.AR, cs.CL, and cs.PF

Abstract: Fine-tuning LLMs is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive. We introduce LowRA, the first framework to enable LoRA fine-tuning below 2 bits per parameter with minimal performance loss. LowRA optimizes fine-grained quantization - mapping, threshold selection, and precision assignment - while leveraging efficient CUDA kernels for scalable deployment. Extensive evaluations across 4 LLMs and 4 datasets show that LowRA achieves a superior performance-precision trade-off above 2 bits and remains accurate down to 1.15 bits, reducing memory usage by up to 50%. Our results highlight the potential of ultra-low-bit LoRA fine-tuning for resource-constrained environments.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (4)

Tweets

https://twitter.com/zhou_cyrus68804/status/1944417663087771941

https://twitter.com/ZainHasan6/status/1890104839709110755

https://twitter.com/HPCPapers/status/1896992309314539734

LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits (2502.08141v1)

Summary

Follow-up Questions

Related Papers

Authors (4)

Tweets