Class-incremental Learning via Deep Model Consolidation (1903.07864v4)

Published 19 Mar 2019 in cs.CV and cs.LG

Abstract: Deep neural networks (DNNs) often suffer from "catastrophic forgetting" during incremental learning (IL) --- an abrupt degradation of performance on the original set of classes when the training objective is adapted to a newly added set of classes. Existing IL approaches tend to produce a model that is biased towards either the old classes or new classes, unless with the help of exemplars of the old data. To address this issue, we propose a class-incremental learning paradigm called Deep Model Consolidation (DMC), which works well even when the original training data is not available. The idea is to first train a separate model only for the new classes, and then combine the two individual models trained on data of two distinct set of classes (old classes and new classes) via a novel double distillation training objective. The two existing models are consolidated by exploiting publicly available unlabeled auxiliary data. This overcomes the potential difficulties due to the unavailability of original training data. Compared to the state-of-the-art techniques, DMC demonstrates significantly better performance in image classification (CIFAR-100 and CUB-200) and object detection (PASCAL VOC 2007) in the single-headed IL setting.

PDF Abstract

An Analytical Examination of Class-Incremental Learning via Deep Model Consolidation

The paper entitled "Class-incremental Learning via Deep Model Consolidation" explores the enduring issue of catastrophic forgetting in deep neural networks (DNNs) during incremental learning (IL). Specifically, it proposes a novel methodology termed Deep Model Consolidation (DMC) to tackle the performance deterioration DNNs experience when they adapt to new classes without retaining performance on previously learned ones. This work offers a significant advancement in IL by presenting a strategy that eliminates the bias towards either old or new classes even in the absence of original training data, thereby leveraging a double distillation training objective.

Overview of the Problem and Proposed Solution

The traditional DNN training paradigms are generally contingent on the availability of complete datasets encompassing all classes, with performance affected when updates are necessary due to emerging new classes. The immediate naive strategies, such as fine-tuning with new classes, often lead to catastrophic forgetting—where models quickly lose knowledge of previously learned concepts. This paper emphasizes a real-world constraint of not having access to originally trained data, thereby insisting on a memory-efficient mechanism.

DMC introduces a clever bifurcation in learning: first, the training of a new model focused on new classes using the newly available labeled data; second, the integration of this model with the previously trained model using unlabeled auxiliary data in a process described as double distillation. This integration uses existing models as separate teachers, whereby the comprehensive model consolidates their functionalities without the necessity of original datasets.

Robust Experimental Validation

The paper substantiates the efficacy of DMC through extensive experimental setups. Performance gains were demonstrated over state-of-the-art methods on benchmarks including CIFAR-100 and CUB-200 for image classification and PASCAL VOC 2007 for object detection. Numerical results indicate DMC consistently surpassing existing methods under varied experimental conditions like different class increments per session. For instance, DMC achieved substantial improvements in the accuracy departments when incrementally learning on iCIFAR-100 benchmark datasets across different group settings (e.g., 5, 10, 20, 50 classes).

Implications and Prospects

Practically, DMC provides a foundation for applying IL in environments with stringent data privacy or storage limitations. As auxiliary data can be obtained easily and discarded post-use, DMC bypasses the need to store entire historical datasets. The theoretical contribution of DMC lies in its ability to maintain unbiased learning by leveraging asynchronous signals from distinct teacher models through the use of publicly available auxiliary data, independent of their generative distribution or completeness in representing task classes.

Future Research Directions

Prospective research building on this paper's findings could delve into quantifying the relationship between auxiliary data characteristics and performance metrics more formally. Moreover, further explorations could enhance DMC by integrating exemplar-based learning strategies to potentially improve performance. Another salient development could involve extending DMC to more generalized scenarios such as consolidating multiple models with overlapping classes while maintaining performance consistency.

In conclusion, this paper contributes to the incremental learning domain by offering an effective model consolidation framework that pragmatically and resourcefully balances legacy knowledge retention and new class accommodation, proving its utility in dynamically evolving data environments.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Junting Zhang (11 papers)
Jie Zhang (847 papers)
Shalini Ghosh (34 papers)
Dawei Li (75 papers)
Serafettin Tasci (3 papers)
Larry Heck (41 papers)
Heming Zhang (13 papers)
C. -C. Jay Kuo (176 papers)

Citations (318)

View on Semantic Scholar