Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators (2204.03243v1)

Published 7 Apr 2022 in cs.CL and cs.LG

Abstract: We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked LLMs (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yu Meng (92 papers)
  2. Chenyan Xiong (95 papers)
  3. Payal Bajaj (13 papers)
  4. Saurabh Tiwary (15 papers)
  5. Paul Bennett (17 papers)
  6. Jiawei Han (263 papers)
  7. Xia Song (38 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.