MC-BERT: Efficient Language Pre-Training via a Meta Controller (2006.05744v2)

Published 10 Jun 2020 in cs.CL and cs.LG

Abstract: Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked LLMing). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.

Authors (8)

Zhenhui Xu (8 papers)
Linyuan Gong (10 papers)
Guolin Ke (43 papers)
Di He (108 papers)
Shuxin Zheng (32 papers)
Liwei Wang (239 papers)
Jiang Bian (229 papers)
Tie-Yan Liu (242 papers)

Citations (17)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

MC-BERT: Efficient Language Pre-Training via a Meta Controller (2006.05744v2)

Summary

Related Papers