Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning (2402.15751v1)

Published 24 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: While fine-tuning LLMs for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient estimates in zeroth order optimization often depends on the data dimensionality, potentially explaining why MeZO still exhibits significant performance drops compared to standard fine-tuning across various tasks. Inspired by the success of Parameter-Efficient Fine-Tuning (PEFT), this paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization approach that applies ZO only to a carefully chosen subset of parameters. We propose a simple yet effective parameter selection scheme that yields significant performance gains with Sparse-MeZO. Additionally, we develop a memory-optimized implementation for sparse masking, ensuring the algorithm requires only inference-level memory consumption, allowing Sparse-MeZO to fine-tune LLaMA-30b on a single A100 GPU. Experimental results illustrate that Sparse-MeZO consistently improves both performance and convergence speed over MeZO without any overhead. For example, it achieves a 9\% absolute accuracy improvement and 3.5x speedup over MeZO on the RTE task.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (56)

Authors (6)

Yong Liu (721 papers)
Zirui Zhu (6 papers)
Chaoyu Gong (5 papers)
Minhao Cheng (43 papers)
Cho-Jui Hsieh (211 papers)
Yang You (173 papers)

Citations (5)

View on Semantic Scholar

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning (2402.15751v1)

Related Papers