2000 character limit reached
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation (2002.10345v1)
Published 24 Feb 2020 in cs.CL and cs.LG
Abstract: Fine-tuning pre-trained LLMs like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure, re-designing the pre-train tasks, and leveraging external data and knowledge. The fine-tuning strategy itself has yet to be fully explored. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation. The experiments on text classification and natural language inference tasks show our proposed methods can significantly improve the adaption of BERT without any external data or knowledge.
- Yige Xu (9 papers)
- Xipeng Qiu (257 papers)
- Ligao Zhou (1 paper)
- Xuanjing Huang (287 papers)