Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Sample Replacements for ELECTRA Pre-Training (2106.13715v1)

Published 25 Jun 2021 in cs.CL

Abstract: ELECTRA pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked LLMing. Despite the compelling performance, ELECTRA suffers from the following two issues. First, there is no direct feedback loop from discriminator to generator, which renders replacement sampling inefficient. Second, the generator's prediction tends to be over-confident along with training, making replacements biased to correct tokens. In this paper, we propose two methods to improve replacement sampling for ELECTRA pre-training. Specifically, we augment sampling with a hardness prediction mechanism, so that the generator can encourage the discriminator to learn what it has not acquired. We also prove that efficient sampling reduces the training variance of the discriminator. Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements. Experimental results show that our method improves ELECTRA pre-training on various downstream tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yaru Hao (16 papers)
  2. Li Dong (154 papers)
  3. Hangbo Bao (17 papers)
  4. Ke Xu (309 papers)
  5. Furu Wei (291 papers)
Citations (10)