Boosting Large Language Models with Mask Fine-Tuning (2503.22764v1)

Published 27 Mar 2025 in cs.CL, cs.AI, and cs.LG

Abstract: The model is usually kept integral in the mainstream LLM fine-tuning protocols. No works have questioned whether maintaining the integrity of the model is indispensable for performance. In this work, we introduce Mask Fine-Tuning (MFT), a brand-new LLM fine-tuning paradigm to show that properly breaking the integrity of the model can surprisingly lead to improved performance. Specifically, MFT learns a set of binary masks supervised by the typical LLM fine-tuning objective. Extensive experiments show that MFT gains a consistent performance boost across various domains and backbones (e.g., 1.95%/1.88% average gain in coding with LLaMA2-7B/3.1-8B). Detailed procedures are provided to study the proposed MFT from different hyperparameter perspectives for better insight. In particular, MFT naturally updates the current LLM training protocol by deploying it on a complete well-trained model. This study extends the functionality of mask learning from its conventional network pruning context for model compression to a more general scope.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (6)

Mingyuan Zhang (41 papers)
Yue Bai (28 papers)
Huan Wang (211 papers)
Yizhou Wang (162 papers)
Qihua Dong (4 papers)
Yun Fu (131 papers)

Boosting Large Language Models with Mask Fine-Tuning (2503.22764v1)

Related Papers