Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-Training Transformers as Energy-Based Cloze Models (2012.08561v1)

Published 15 Dec 2020 in cs.CL

Abstract: We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than LLMs and much faster than masked LLMs. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kevin Clark (16 papers)
  2. Minh-Thang Luong (32 papers)
  3. Quoc V. Le (128 papers)
  4. Christopher D. Manning (169 papers)
Citations (76)

Summary

We haven't generated a summary for this paper yet.