Continual Learning with Pre-Trained Models: A Survey (2401.16386v2)

Published 29 Jan 2024 in cs.LG and cs.CV

Abstract: Nowadays, real-world applications often face streaming data, which requires the learning system to absorb new knowledge as data evolves. Continual Learning (CL) aims to achieve this goal and meanwhile overcome the catastrophic forgetting of former knowledge when learning new ones. Typical CL methods build the model from scratch to grow with incoming data. However, the advent of the pre-trained model (PTM) era has sparked immense research interest, particularly in leveraging PTMs' robust representational capabilities. This paper presents a comprehensive survey of the latest advancements in PTM-based CL. We categorize existing methodologies into three distinct groups, providing a comparative analysis of their similarities, differences, and respective advantages and disadvantages. Additionally, we offer an empirical study contrasting various state-of-the-art methods to highlight concerns regarding fairness in comparisons. The source code to reproduce these evaluations is available at: https://github.com/sun-hailong/LAMDA-PILOT

Citations (36)

View on Semantic Scholar

Summary

The paper presents a comprehensive comparison of PTM-based continual learning methods, detailing prompt, representation, and model mixture approaches to address catastrophic forgetting.
It demonstrates that representation-based methods often outperform others by leveraging inherent, generalizable features from pre-trained models.
Empirical insights reveal challenges with domain gaps and highlight future research avenues for improving efficiency and fairness in continual learning.

Introduction to Pre-Trained Model-based Continual Learning

Continual Learning (CL) represents a significant challenge within the field of AI due to the classic problem of catastrophic forgetting—where the integration of new knowledge leads to a degradation of previously learned information. The surge of pre-trained models (PTMs), which bring extensive datasets and advanced techniques to the table, proposes a novel solution. These models inherently provide a solid foundation for generalizing across various downstream tasks, thereby positioning PTM-based CL as an essential discourse within current research. The paper under discussion splits methods into three distinct categories: prompt-based, representation-based, and model mixture-based, offering a comprehensive comparison of their attributes and performance on several benchmark datasets.

Prompt-based Methods in PTM-based CL

Prompt-based methods capitalize on the power of PTMs by introducing lightweight prompts or tunable parameters to bridge the domain gap between the foundational model and the nuanced requirements of new tasks. Further, the paper explores how prompt pools can act as external memory, contributing to the adaptive knowledge retrieval process. However, these methods can encounter issues like forgetting at the prompt level and selection bottleneck—forcing the need to develop an efficient prompt selection mechanism. Moreover, the paper reveals that some prompt-based methods, such as DAP, may employ batch voting information, which can lead to unfair performance advantages and misaligned comparisons.

Representation-based Methods and Their Advantages

Representation-based strategies take full advantage of the innate ability of PTMs to generalize. Methods like SimpleCIL utilize class prototypes directly derived from pre-trained features—showing surprisingly competitive performance against more complex models. This demonstrates that the mere existence of generalizable PTM features is often sufficient for various downstream tasks. Other approaches, such as ADAM, suggest finetuning on downstream datasets to capture task-specific knowledge. These methods are particularly lightweight in their update costs, underscoring their potential utility in practical applications.

Considering Model Mixture-based Methods

Model mixture-based approaches offer robust solutions by blending multiple models for a unified prediction. This can result in a diversity of decisions, increase robustness, and manage the trade-off between former and latter model learning. Despite these benefits, both model ensemble and model merging strategies can present challenges related to memory consumption and an artisanal, less systematic approach to deciding the composition of the final model.

Empirical Insights from Comparative Studies

In a compelling empirical exploration over various datasets, the paper assesses PTM-based CL methods, detecting a general struggle with datasets that exhibit a significant domain gap from the pre-trained source. Representation-based methods tend to exhibit better performance, suggesting that prompt-based and model mixture-based approaches may have untapped potential in richer representations. The paper challenges the efficacy and complexity of some methods against more straightforward approaches, bringing to light the importance of fairness and validity in performance comparisons.

Pathways Moving Forward

The paper indicates various promising research avenues, notably addressing the need for CL in pre-trained LLMs, expanding beyond single-modality recognition, optimizing for computational efficiency, creating new benchmarks that stretch beyond the knowledge encapsulated in PTMs, and pursuing theoretical clarifications regarding the protective qualities of PTMs against forgetting.

Final Thoughts

The extensive research and analysis furnished by this paper articulate the compelling potential of PTM-based continual learning. With advancements and insights into categories such as prompt-based, representation-based, and model mixture-based methods, along with critical assessments of their comparative performance, this work outlines the state-of-the-art in this field. It stands as a critical examination of the current methods and serves as a pivotal guide for future directions to innovate and refine continual learning paradigms.

Related Papers

GitHub

GitHub - sun-hailong/LAMDA-PILOT: 🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox (330 stars)

Tweets

https://twitter.com/gm8xx8/status/1752153262990446953