A continual learning survey: Defying forgetting in classification tasks (1909.08383v3)

Published 18 Sep 2019 in cs.CV and stat.ML

Abstract: Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.

Authors (8)

Matthias De Lange (12 papers)
Rahaf Aljundi (33 papers)
Marc Masana (20 papers)
Sarah Parisot (30 papers)
Xu Jia (57 papers)
Ales Leonardis (84 papers)
Gregory Slabaugh (52 papers)
Tinne Tuytelaars (150 papers)

Citations (233)

View on Semantic Scholar

Summary

Continual Learning: Addressing Catastrophic Forgetting

Overview

Continual learning in artificial neural networks focuses on creating models that can learn sequentially from tasks, continually accumulating knowledge without the need to retrain on prior tasks. This paradigm shift aims to tackle the problem known as catastrophic forgetting, which occurs when a network loses previously acquired knowledge as it learns new information. The paper in review advances in this direction by systematically analyzing and comparing several state-of-the-art continual learning methods for classification tasks.

Stability-Plasticity Trade-off

A central aspect of continual learning revolves around the stability-plasticity trade-off. Neural networks must be plastic enough to accommodate new knowledge while stable enough not to forget prior learning. This paper presents an evaluation framework designed to control this trade-off dynamically, influencing the regularization hyperparameters only using the current task dataset, hence adhering to a true continual learning setting.

Methodological Broadness

Several continual learning strategies are scrutinized in the evaluation framework. The methods fall into three groups: replay methods, regularization-based approaches, and parameter isolation strategies. Replay methods, such as iCaRL and GEM, store raw input samples or model outputs from previous tasks, reusing them to retain old knowledge when learning new tasks. Regularization-based methods introduce regularization terms that consolidate prior knowledge, exemplified by approaches like Learning without Forgetting (LwF) and Elastic Weight Consolidation (EWC). Parameter isolation techniques, including PackNet and HAT, assign separate model parameters to each task to mask out the influence of learning new information on older knowledge.

Experimental Insights

The experiments span three datasets, including Tiny ImageNet and larger-scale iNaturalist and RecogSeq, the latter two presenting unbalanced class distributions and more complex recognition tasks. These setups serve to evaluate the methodologies in conditions close to real-world applications.

PackNet emerges as the lead performer across all data landscapes, attributed to its ability to prevent forgetting entirely. However, it is confined to the initial model capacity, which restricts its scalability. Meanwhile, the Memory Aware Synapses (MAS) showcases robust hyperparameter settings, making it the top player among regularization-based methods. iCaRL displays competitive edge with increased memory size and presents an adept strategy for exemplar management.

Considerations and Future Directions

The investigation sheds light on the negligible impact of task ordering on method performance, emphasizing that methodological robustness transcends the sequence of task introduction. The analysis also uncovers that deep architectures may not suit the continual learning framework well and pinpoints optimal conditions like balanced and wide model capacities under which current continual learning methods thrive.

Despite the progress, the paper underscores the need for future research to go beyond classification tasks and multi-head evaluation, and challenges exist in creating methods that are not restricted by the initial model size and can handle open-ended task sequences. Additionally, the paper inspires the quest for continual learning solutions that respect user privacy, given that some current methods necessitate raw data retention.

Conclusion

The thoroughly comparative paper presents a concrete step towards resolving catastrophic forgetting via continual learning. It provides a clear understanding of current methods' strengths, weaknesses, and applicability in various data settings. Furthermore, it establishes a foundation for developing more sophisticated models that embrace the dynamism of the real world, remain cognizant of the balance between stability and plasticity, and operate within practical memory constraints.

PDF Markdown

Related Papers

Find Related Papers