Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Forgetting in Pre-Trained Representations Through Continual Learning (2305.05968v1)

Published 10 May 2023 in cs.CL

Abstract: Representation forgetting refers to the drift of contextualized representations during continual training. Intuitively, the representation forgetting can influence the general knowledge stored in pre-trained LLMs (LMs), but the concrete effect is still unclear. In this paper, we study the effect of representation forgetting on the generality of pre-trained LLMs, i.e. the potential capability for tackling future downstream tasks. Specifically, we design three metrics, including overall generality destruction (GD), syntactic knowledge forgetting (SynF), and semantic knowledge forgetting (SemF), to measure the evolution of general knowledge in continual learning. With extensive experiments, we find that the generality is destructed in various pre-trained LMs, and syntactic and semantic knowledge is forgotten through continual learning. Based on our experiments and analysis, we further get two insights into alleviating general knowledge forgetting: 1) training on general linguistic tasks at first can mitigate general knowledge forgetting; 2) the hybrid continual learning method can mitigate the generality destruction and maintain more general knowledge compared with those only considering rehearsal or regularization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yun Luo (33 papers)
  2. Zhen Yang (160 papers)
  3. Xuefeng Bai (34 papers)
  4. Fandong Meng (174 papers)
  5. Jie Zhou (687 papers)
  6. Yue Zhang (620 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.