Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MC-BERT: Efficient Language Pre-Training via a Meta Controller (2006.05744v2)

Published 10 Jun 2020 in cs.CL and cs.LG

Abstract: Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked LLMing). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhenhui Xu (8 papers)
  2. Linyuan Gong (10 papers)
  3. Guolin Ke (43 papers)
  4. Di He (108 papers)
  5. Shuxin Zheng (32 papers)
  6. Liwei Wang (239 papers)
  7. Jiang Bian (229 papers)
  8. Tie-Yan Liu (242 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.