Benchmark-generalization of LoRA-GA across datasets

Ascertain whether the performance improvements and convergence behavior of LoRA-GA observed on MTBench, GSM8K, and Human-eval are consistent across a broader set of evaluation datasets and benchmarks, thereby determining the generality of its advantages.

Background

LoRA-GA showed strong results on a limited set of benchmarks for dialogue, mathematical reasoning, and code generation.

The authors did not assess the method on other datasets, prompting uncertainty about its universal consistency across diverse evaluation settings.

References

Another limitation pertains to our evaluation scope. While we provide evaluations on MTBench, GSM8K, and Human-eval, we did not assess our method on other datasets. Consequently, we cannot fully guarantee that our findings are universally consistent across all benchmarks.

— LoRA-GA: Low-Rank Adaptation with Gradient Approximation (2407.05000 - Wang et al., 6 Jul 2024) in Section: Limitations

Benchmark-generalization of LoRA-GA across datasets

Background

References

Related Problems