Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

bert2BERT: Towards Reusable Pretrained Language Models (2110.07143v1)

Published 14 Oct 2021 in cs.CL

Abstract: In recent years, researchers tend to pre-train ever-larger LLMs to explore the upper limit of deep models. However, LLM pre-training costs intensive computational resources and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful. In this paper, we propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model (e.g., BERT_BASE) to a large model (e.g., BERT_LARGE) through parameter initialization and significantly improve the pre-training efficiency of the large model. Specifically, we extend the previous function-preserving on Transformer-based LLM, and further improve it by proposing advanced knowledge for large model's initialization. In addition, a two-stage pre-training method is proposed to further accelerate the training process. We did extensive experiments on representative PLMs (e.g., BERT and GPT) and demonstrate that (1) our method can save a significant amount of training cost compared with baselines including learning from scratch, StackBERT and MSLT; (2) our method is generic and applicable to different types of pre-trained models. In particular, bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes. The source code will be publicly available upon publication.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Cheng Chen (262 papers)
  2. Yichun Yin (27 papers)
  3. Lifeng Shang (90 papers)
  4. Xin Jiang (242 papers)
  5. Yujia Qin (41 papers)
  6. Fengyu Wang (18 papers)
  7. Zhi Wang (261 papers)
  8. Xiao Chen (277 papers)
  9. Zhiyuan Liu (433 papers)
  10. Qun Liu (230 papers)
Citations (52)
X Twitter Logo Streamline Icon: https://streamlinehq.com