Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning (2210.10293v1)

Published 19 Oct 2022 in cs.CL

Abstract: Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective LLMing, which serves the ultimate purpose of pre-trained LLMs (PrLMs), generalizing well on a mass of scenarios. However, learning multiple training objectives in a single model is challenging due to the unknown relative significance as well as the potential contrariety between them. Empirical studies have shown that the current objective sampling in an ad-hoc manual setting makes the learned language representation barely converge to the desired optimum. Thus, we propose \textit{MOMETAS}, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives. Such a design is lightweight with negligible additional training overhead. To validate our approach, we adopt five objectives and conduct continual pre-training with BERT-base and BERT-large models, where MOMETAS demonstrates universal performance gain over other rule-based sampling strategies on 14 natural language processing tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hongqiu Wu (22 papers)
  2. Ruixue Ding (9 papers)
  3. Hai Zhao (227 papers)
  4. Boli Chen (23 papers)
  5. Pengjun Xie (85 papers)
  6. Fei Huang (410 papers)
  7. Min Zhang (632 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.