Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Maximizing Efficiency of Language Model Pre-training for Learning Representation (2110.06620v1)

Published 13 Oct 2021 in cs.CL and cs.LG

Abstract: Pre-trained LLMs in the past years have shown exponential growth in model parameters and compute time. ELECTRA is a novel approach for improving the compute efficiency of pre-trained LLMs (e.g. BERT) based on masked LLMing (MLM) by addressing the sample inefficiency problem with the replaced token detection (RTD) task. Our work proposes adaptive early exit strategy to maximize the efficiency of the pre-training process by relieving the model's subsequent layers of the need to process latent features by leveraging earlier layer representations. Moreover, we evaluate an initial approach to the problem that has not succeeded in maintaining the accuracy of the model while showing a promising compute efficiency by thoroughly investigating the necessity of the generator module of ELECTRA.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Junmo Kang (17 papers)
  2. Suwon Shin (3 papers)
  3. Jeonghwan Kim (20 papers)
  4. Jaeyoung Jo (1 paper)
  5. Sung-Hyon Myaeng (5 papers)

Summary

We haven't generated a summary for this paper yet.