Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boosting Large Language Models with Mask Fine-Tuning (2503.22764v1)

Published 27 Mar 2025 in cs.CL, cs.AI, and cs.LG

Abstract: The model is usually kept integral in the mainstream LLM fine-tuning protocols. No works have questioned whether maintaining the integrity of the model is indispensable for performance. In this work, we introduce Mask Fine-Tuning (MFT), a brand-new LLM fine-tuning paradigm to show that properly breaking the integrity of the model can surprisingly lead to improved performance. Specifically, MFT learns a set of binary masks supervised by the typical LLM fine-tuning objective. Extensive experiments show that MFT gains a consistent performance boost across various domains and backbones (e.g., 1.95%/1.88% average gain in coding with LLaMA2-7B/3.1-8B). Detailed procedures are provided to study the proposed MFT from different hyperparameter perspectives for better insight. In particular, MFT naturally updates the current LLM training protocol by deploying it on a complete well-trained model. This study extends the functionality of mask learning from its conventional network pruning context for model compression to a more general scope.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mingyuan Zhang (41 papers)
  2. Yue Bai (28 papers)
  3. Huan Wang (211 papers)
  4. Yizhou Wang (162 papers)
  5. Qihua Dong (4 papers)
  6. Yun Fu (131 papers)