Towards Continual Knowledge Learning of Language Models (2110.03215v4)

Published 7 Oct 2021 in cs.CL and cs.LG

Abstract: LLMs (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus, which is often utilized for performing knowledge-dependent downstream tasks such as question answering, fact-checking, and open dialogue. In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes, but it is non-trivial to avoid catastrophic forgetting and reliably acquire new knowledge while preserving invariant knowledge. To push the community towards better maintenance of ever-changing LMs, we formulate a new continual learning (CL) problem called Continual Knowledge Learning (CKL). We construct a new benchmark and metric to quantify the retention of time-invariant world knowledge, the update of outdated knowledge, and the acquisition of new knowledge. We adopt applicable recent methods from literature to create several strong baselines. Through extensive experiments, we find that CKL exhibits unique challenges that are not addressed in previous CL setups, where parameter expansion is necessary to reliably retain and learn knowledge simultaneously. By highlighting the critical causes of knowledge forgetting, we show that CKL is a challenging and important problem that helps us better understand and train ever-changing LMs. The benchmark datasets, evaluation script, and baseline code to reproduce our results are available at https://github.com/joeljang/continual-knowledge-learning.

PDF Abstract

Towards Continual Knowledge Learning of LLMs

The paper "Towards Continual Knowledge Learning of LLMs" sets forth a novel approach to address the challenge of continual learning within the context of LLMs (LMs). As LMs are typically pretrained on large static corpora, there lies an inherent limitation in that the knowledge embedded within these models can swiftly become outdated given the dynamic nature of real-world information. The paper introduces a structured method to continually update and refine the knowledge contained in LMs, while mitigating the potential for catastrophic forgetting—a phenomenon where old knowledge is inadvertently erased as new information is learned.

Key Contributions

The authors conceptualize a new continual learning task, termed Continual Knowledge Learning (CKL), which aims to maintain a balance between retaining time-invariant knowledge, updating outdated knowledge, and acquiring new knowledge. This is operationalized through the development of a benchmark and corresponding metrics specifically designed to assess CKL performance.

Benchmark Construction

The CKL benchmark is composed of three datasets:

InvariantLAMA: This dataset evaluates how well models retain time-invariant knowledge. Instances here are designed such that, despite the passage of time and new data, the factual information remains unchanged and should be preserved by the LM.
UpdatedLAMA: This dataset tests the model’s ability to update outdated information with new, conflicting facts derived from recent data sources.
NewLAMA: This evaluates the acquisition of new information that is not present in the original training data.

In addition to these datasets, the authors propose a metric, the Forgotten / (Updated + Acquired) Ratio, to quantify the trade-off between forgotten time-invariant knowledge and absorbed new or updated knowledge.

Experimental Setup and Findings

By leveraging existing methods such as parameter-expansion (LoRA, K-Adapters) and rehearsal strategies (Mix-Review), the authors apply these as baseline models to the CKL framework. The results suggest that parameter-expansion methods generally outperform others, effectively balancing knowledge retention and acquisition, albeit at the expense of increased memory usage.

The paper reveals the critical dependency on learning rate adjustments, memory management, and the number of epochs in mitigating knowledge forgetting. Moreover, the repeated exposure to the same data in successive learning phases (epochs) is recognized as a key factor leading to knowledge degradation.

Implications and Future Directions

While the paper’s findings are preliminary, they are promising for both theoretical understanding and practical implementation of continually learning LMs in real-time applications. This has direct implications for AI systems that require up-to-date knowledge, such as those used in dialogue systems, personalized recommendations, and dynamic information retrieval.

Future research avenues involve refining CKL strategies to further reduce memory consumption and improve computational efficiency, potentially making frequent updates feasible for deployment at scale. The integration of these models into live systems could foster more adaptive, responsive, and knowledge-rich AI applications.

In conclusion, this work provides a substantive conceptual and practical foundation for continuous learning in LLMs, a necessity as the scope of AI applications continues to expand amidst rapidly evolving information landscapes.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Joel Jang (30 papers)
Seonghyeon Ye (25 papers)
Sohee Yang (23 papers)
Joongbo Shin (14 papers)
Janghoon Han (6 papers)
Gyeonghun Kim (7 papers)
Stanley Jungkyu Choi (12 papers)
Minjoon Seo (82 papers)

Citations (135)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - joeljang/continual-knowledge-learning: [ICLR 2022] Towards Continual Knowledge Learning of Language Models (93 stars)

Tweets

https://twitter.com/cloneofsimo/status/1756283494668648764