Sparse is Enough in Fine-tuning Pre-trained Large Language Models (2312.11875v3)

Published 19 Dec 2023 in cs.LG, cs.AI, and cs.CL

Abstract: With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation. Although PEFT has demonstrated effectiveness and been widely applied, the underlying principles are still unclear. In this paper, we adopt the PAC-Bayesian generalization error bound, viewing pre-training as a shift of prior distribution which leads to a tighter bound for generalization error. We validate this shift from the perspectives of oscillations in the loss landscape and the quasi-sparsity in gradient distribution. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/.

References (36)

Authors (5)

Weixi Song (3 papers)
Zuchao Li (76 papers)
Lefei Zhang (64 papers)
Hai Zhao (227 papers)
Bo Du (264 papers)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - song-wx/SIFT: Fine Tuning the Large Language Model Sparsely (14 stars)
GitHub - song-wx/SIFT: Fine Tuning the Large Language Model Sparsely (14 stars)

Sparse is Enough in Fine-tuning Pre-trained Large Language Models (2312.11875v3)

Summary

Related Papers

GitHub