BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models (2404.02827v3)

Published 3 Apr 2024 in cs.LG

Abstract: This work presents BAdam, an optimization method that leverages the block coordinate descent (BCD) framework with Adam's update rule. BAdam offers a memory efficient approach to the full parameter finetuning of LLMs. We conduct a theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to finetune the Llama 3-8B and Llama 3-70B models using a single RTX3090-24GB GPU and 4 A100-80GB GPUs, respectively. The results confirm BAdam's efficiency in terms of memory usage, running time, and optimization capability. Furthermore, the downstream performance evaluation based on MT-bench and math benchmarks shows that BAdam outperforms existing memory efficient baselines such as LoRA. It also demonstrates that BAdam can achieve comparable or even superior performance compared to Adam. Finally, the ablation study using SGD's update rule illustrates the suitability of BCD for finetuning LLMs. Our code can be easily integrated into any PyTorch-based codebase and is available at https://github.com/Ledzy/BAdam.

References (30)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/gm8xx8/status/1775699693718102406

https://twitter.com/Nexus737326/status/1832129959537606674

https://twitter.com/Phanron_xli/status/1811678728243806462

YouTube

Show All Videos

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models (2404.02827v3)

Summary

Related Papers

Tweets

YouTube