LoQT: Low-Rank Adapters for Quantized Pretraining (2405.16528v4)

Published 26 May 2024 in cs.LG and cs.CL

Abstract: Despite advances using low-rank adapters and quantization, pretraining of large models on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose Low-Rank Adapters for Quantized Training (LoQT), a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models. We demonstrate this for LLMing and downstream task adaptation, finding that LoQT enables efficient training of models up to 7B parameters on a 24GB GPU. We also demonstrate the feasibility of training a 13B model using per-layer gradient updates on the same hardware.

References (44)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/rohanpaul_ai/status/1863921842446344611

https://twitter.com/SLoeschcke/status/1795492538654347311

https://twitter.com/PaulBalanca/status/1801922372175368421

Why no one was talking about this paper? (0 points, 3 comments)

LoQT: Low-Rank Adapters for Quantized Pretraining (2405.16528v4)

Summary

Related Papers

Tweets

Reddit