Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Published 16 May 2023 in cs.CL and cs.AI | (2305.09144v2)

Abstract: Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained LLMs have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memory mechanism of LLMs, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla LLMs are forgetful; 2) Pre-training leads to retentive LLMs; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained LLMs and shed light on designing and evaluating new learning and inference algorithms of LLMs.