Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

80 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

73 2

A Comprehensive Survey of Continual Learning: Theory, Method and Application (2302.00487v3)

Published 31 Jan 2023 in cs.LG, cs.AI, and cs.CV

Abstract: To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems to develop themselves adaptively. In a general sense, continual learning is explicitly limited by catastrophic forgetting, where learning a new task usually results in a dramatic performance degradation of the old tasks. Beyond this, increasingly numerous advances have emerged in recent years that largely extend the understanding and application of continual learning. The growing and widespread interest in this direction demonstrates its realistic significance as well as complexity. In this work, we present a comprehensive survey of continual learning, seeking to bridge the basic settings, theoretical foundations, representative methods, and practical applications. Based on existing theoretical and empirical results, we summarize the general objectives of continual learning as ensuring a proper stability-plasticity trade-off and an adequate intra/inter-task generalizability in the context of resource efficiency. Then we provide a state-of-the-art and elaborated taxonomy, extensively analyzing how representative methods address continual learning, and how they are adapted to particular challenges in realistic applications. Through an in-depth discussion of promising directions, we believe that such a holistic perspective can greatly facilitate subsequent exploration in this field and beyond.

PDF HTML Abstract

Continual Learning: A Comprehensive Survey of Theory, Methods, and Applications

Continual learning (CL), also known as incremental learning or lifelong learning, is an area of machine learning that addresses the challenge of learning from non-stationary data where an intelligent system needs to acquire, update, and exploit knowledge incrementally. This ability is fundamentally constrained by the phenomenon of catastrophic forgetting, where learning new information can lead to the degradation of performance on previously learned tasks. Over the years, several significant advancements have been made to extend our understanding and enhance the application of continual learning, as evidenced by a comprehensive survey conducted by Wang et al.

Overview of Key Contributions

The survey by Wang et al. provides an extensive taxonomy of state-of-the-art CL methods by systematically categorizing them into five major approaches: regularization-based, replay-based, optimization-based, representation-based, and architecture-based. This taxonomy facilitates an understanding of how various CL strategies are adapted to address specific challenges in practical applications.

Regularization-Based Approach

This approach is marked by the introduction of explicit regularization terms in the loss function to balance new and existing tasks. It has two primary subcategories: weight regularization and function regularization. Weight regularization penalizes changes in parameters critical for previous tasks using metrics like the Fisher Information Matrix (FIM), as implemented by methods like EWC and its variants. On the other hand, function regularization focuses on distilling knowledge from the previous model to the current model, as seen in methods like LwF and its extensions.

Replay-Based Approach

Replay-based methods aim to approximate and recover older data distributions, crucially countering catastrophic forgetting. This category is subdivided into experience replay, generative replay, and feature replay. Experience replay retains a limited set of older training samples, generative replay utilizes sampled data from generative models, and feature replay employs statistical methods over feature space. Although promising, replay-based methods face challenges such as designing efficient memory buffers and overfitting to replayed samples.

Optimization-Based Approach

Optimization-based methods manipulate the optimization process more explicitly, such as through gradient projection to balance stability and plasticity. This also includes meta-learning strategies, optimizing gradient directions based on experience, and developing robust loss landscapes that facilitate task transitions while minimizing interference, as demonstrated by methods like OML and related works.

Representation-Based and Architecture-Based Approaches

Representation-based methods focus on leveraging robust representations, often gained through self-supervised learning or pre-training, to improve generalization and minimize forgetting. Properly leveraging the representations in continual learning enhances model robustness significantly. Architecture-based approaches involve constructing adaptive task-specific architectures or modular networks to segregate tasks, thus avoiding interference, as seen with methods like Progressive Networks.

Implications and Outlook

Wang et al.’s survey provides insights into effective strategies for addressing the stability-plasticity dilemma and improving generalizability across tasks. It highlights the importance of general objectives such as ensuring proper stability-plasticity trade-off and adequate intra/inter-task generalizability while considering resource efficiency. The discussion on practical applications of continual learning in areas like object detection, semantic segmentation, and reinforcement learning underscores the broad applicability and real-world relevance of CL research.

The survey also underscores the increasing use of pre-training and self-supervised learning for obtaining robust initial representations, which have shown significant reduction in the impact of catastrophic forgetting. This paves the way for more cross-domain and interdisciplinary applications in heterogeneous data contexts, potentially integrating neural architecture search, efficient memory utilization, and context-aware systems.

Looking forward, continued research in CL is expected to refine its theoretical foundations and extend its applications across diverse domains, from foundational AI systems to neuroscientific studies of biological learning. Advances at the intersection of CL and robust large-scale models, like transformers in foundational AI, hold the potential for rich, multi-modal learning systems capable of adapting to rapidly changing environments with minimal interference.

PDF Markdown Bookmark Chat (Pro)

References (528)

Authors (4)

Liyuan Wang (33 papers)
Xingxing Zhang (65 papers)
Hang Su (224 papers)
Jun Zhu (424 papers)

Citations (388)

View on Semantic Scholar

Tweets

https://twitter.com/ChombaBupe/status/1803215996745863363

https://twitter.com/LiyuanW30011275/status/1754798070372675600

https://twitter.com/zhzHNN/status/1814509869300654303

YouTube

Show All Videos