Scaling LoRA-GA to larger pretrained models

Determine whether Low-Rank Adaptation with Gradient Approximation (LoRA-GA) maintains its convergence speed and performance benefits when validated on substantially larger pretrained models such as Llama 2-70B, establishing its effectiveness at scale relative to full fine-tuning.

Background

The paper presents LoRA-GA, an initialization method for LoRA that approximates full fine-tuning gradients and demonstrates faster convergence and competitive performance on T5-Base (220M) and Llama 2-7B across several benchmarks.

Due to computational constraints, the authors did not evaluate LoRA-GA on much larger models, leaving its scalability and performance at higher parameter counts unresolved.

References

However, due to computational resource constraints, we have not validated LoRA-GA on larger pre-trained models (e.g., Llama 2-70B).

— LoRA-GA: Low-Rank Adaptation with Gradient Approximation (2407.05000 - Wang et al., 6 Jul 2024) in Section: Limitations

Scaling LoRA-GA to larger pretrained models

Background

References

Related Problems