End-to-End Incremental Learning (1807.09536v2)

Published 25 Jul 2018 in cs.CV

Abstract: Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model -a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and ImageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.

Authors (5)

Francisco M. Castro (5 papers)
Cordelia Schmid (206 papers)
Karteek Alahari (48 papers)
Manuel J. Marín-Jiménez (5 papers)
Nicolás Guil (4 papers)

Citations (1,073)

View on Semantic Scholar

Summary

End-to-End Incremental Learning

This paper addresses the challenge of catastrophic forgetting in deep learning models by proposing an end-to-end approach for incremental learning. The persistent issue of catastrophic forgetting—where neural networks significantly degrade performance on older classes when trained on new ones—arises from the necessity of retaining the entire dataset (both old and new classes) during model updates. The authors introduce a method that retains knowledge of old classes while learning new ones using a small exemplar set rather than the entire dataset.

Proposed Method

The authors propose an approach that combines cross-entropy and distillation loss functions to achieve incremental learning. The core strategy involves:

Representative Memory: A memory buffer that keeps a small set of exemplars from previous classes, ensuring that the model retains its ability to recognize old classes.
Cross-Distilled Loss: A novel loss function combining distillation and cross-entropy losses to retain old knowledge and learn new classes simultaneously.
Balanced Fine-Tuning: A fine-tuning step that ensures the model is not biased towards new classes by training with a balanced number of samples from both old and new classes.

Incremental Learning Process

The incremental learning process involves several stages:

Construction of Training Set: The training set is constructed from samples of new classes and exemplar samples of old classes stored in the representative memory. Each sample carries two labels: classification labels and distillation labels from older models.
Training Process: The combined cross-distilled loss function is used to train the deeper models, updating all the weights, including those for old classes.
Balanced Fine-Tuning: An additional fine-tuning step that strikes a balance between the number of samples from old and new classes to prevent bias towards new classes.
Update Representative Memory: Add exemplars from new classes to the memory, ensuring a balanced representation of all classes.

Experimental Results

The paper presents extensive evaluation on CIFAR-100 and ImageNet datasets to demonstrate the effectiveness of the proposed method. Key observations from the experiments include:

CIFAR-100: The proposed method consistently achieved state-of-the-art results with incremental steps of 2, 5, 10, and 20 classes, showing significant improvements over iCaRL and LwF.MC.
ImageNet: The method also outperformed others in more extensive datasets with a larger number of classes per incremental step, demonstrating scalability and robustness.
Similar Classes: Experiments with subsets of similar classes (e.g., vehicle types and dog breeds) further affirmed the superior performance of the proposed end-to-end approach over external classifiers like NCM.

Ablation Studies

A series of ablation studies analyzed the impact of various components, such as data augmentation, balanced fine-tuning, and exemplar selection strategies, on the overall performance:

Data augmentation significantly enhanced the robustness of the model.
Balanced fine-tuning effectively addressed the imbalance between old and new data.
Herding selection strategy was found to be the most consistent for selecting exemplars.

Implications and Future Work

The implications of this research extend to practical applications where models must adapt to new data without compromising existing knowledge. This is especially relevant in fields such as autonomous driving, where an incremental learning approach can ensure continuous improvement as new data becomes available.

Future research directions include exploring dynamic memory allocation strategies and enhancing the selection process for exemplars, potentially incorporating more refined and sophisticated criteria beyond herding selection.

Conclusion

By leveraging a combined cross-distilled loss function and the strategic use of a representative memory, this paper significantly advances the field of incremental learning. The proposed method outperforms existing techniques and demonstrates robust, scalable results, paving the way for future enhancements in continuous learning paradigms.

PDF Markdown

Related Papers

Find Related Papers