Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization (2305.14760v2)

Published 24 May 2023 in cs.CL

Abstract: Pretrained LLMs have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner, so it overcomes the problem of hysteresis in sub-net updating, which is possessed by previous methods that perform asynchronous sub-net estimation. Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods. Furthermore, empirical results also show that Bi-Drop exhibits excellent generalization ability and robustness for domain transfer, data imbalance, and low-resource scenarios.

Authors (8)

Shoujie Tong (3 papers)
Heming Xia (22 papers)
Damai Dai (38 papers)
Runxin Xu (30 papers)
Tianyu Liu (177 papers)
Binghuai Lin (20 papers)
Yunbo Cao (43 papers)
Zhifang Sui (89 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization (2305.14760v2)

Summary

Related Papers