Large Scale Incremental Learning (1905.13260v1)

Published 30 May 2019 in cs.CV

Abstract: Modern machine learning suffers from catastrophic forgetting when learning new classes incrementally. The performance dramatically degrades due to the missing data of old classes. Incremental learning methods have been proposed to retain the knowledge acquired from the old classes, by using knowledge distilling and keeping a few exemplars from the old classes. However, these methods struggle to scale up to a large number of classes. We believe this is because of the combination of two factors: (a) the data imbalance between the old and new classes, and (b) the increasing number of visually similar classes. Distinguishing between an increasing number of visually similar classes is particularly challenging, when the training data is unbalanced. We propose a simple and effective method to address this data imbalance issue. We found that the last fully connected layer has a strong bias towards the new classes, and this bias can be corrected by a linear model. With two bias parameters, our method performs remarkably well on two large datasets: ImageNet (1000 classes) and MS-Celeb-1M (10000 classes), outperforming the state-of-the-art algorithms by 11.1% and 13.2% respectively.

Authors (7)

Yue Wu (339 papers)
Yinpeng Chen (55 papers)
Lijuan Wang (133 papers)
Yuancheng Ye (3 papers)
Zicheng Liu (153 papers)
Yandong Guo (78 papers)
Yun Fu (131 papers)

Citations (1,107)

View on Semantic Scholar

Summary

Large Scale Incremental Learning: A Formal Overview

The paper "Large Scale Incremental Learning," authored by Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu, addresses the challenges and methodologies pertinent to incremental learning in large-scale machine learning environments. It specifically targets the problem of catastrophic forgetting, a phenomenon where a machine learning model's performance degrades significantly on previously learned classes when new classes are incrementally introduced.

Problem Context

Incremental learning aims to enable models to learn new concepts without forgetting previously acquired knowledge. Traditional approaches generally suffer from two primary issues: (a) data imbalance between old and new classes and (b) the increasing number of visually similar classes. The paper hypothesizes that these factors collectively challenge the model's ability to maintain its performance as the number of classes scales up.

Proposed Methodology: Bias Correction (BiC)

The authors propose a novel method called Bias Correction (BiC) to address these challenges effectively. The hypothesis central to BiC is that the classifier layer, particularly the last fully connected (FC) layer, exhibits a strong bias towards the newer classes due to an imbalance in training data.

Method Overview:

Training with Distillation Loss: The method begins with training the convolutional and fully connected layers using both a distillation loss and a classification loss. The distillation loss helps retain the knowledge of the old classes by leveraging knowledge distillation frameworks.
Bias Correction Layer: To mitigate the bias in the classifier layer, the authors introduce a bias correction layer. This layer is a linear model with two parameters that correct the bias in the logits produced by the last fully connected layer. The parameters of the bias correction layer are learned using a small validation set, which is designed to approximate the real data distribution of both old and new classes.

Implementation and Results

The paper substantiates the efficacy of the BiC method with comprehensive experimental evaluations on three datasets: CIFAR-100, ImageNet-1000, and a subset of 10,000 classes from the MS-Celeb-1M dataset. The results indicate that the BiC method outperforms state-of-the-art algorithms, such as iCaRL and EEIL, by significant margins in large-scale scenarios.

Key Numerical Results:

ImageNet-1000:
- BiC outperformed EEIL and iCaRL by 18.5% and 26.5% respectively in the final step of incremental learning.
- Average gain was 11.1% over EEIL and 19.7% over iCaRL.
MS-Celeb-1M (Celeb-10000):
- BiC showed an average gain of 13.2% over iCaRL across 10 incremental steps.

Theoretical and Practical Implications

The practical implications of this research are significant, especially for applications requiring continuous learning from highly dynamic datasets. The approach's ability to retain old knowledge while effectively incorporating new information makes it particularly valuable in fields such as facial recognition, image classification, and other domains where the class space frequently evolves.

On a theoretical front, the paper highlights the importance of addressing inter-class imbalance and offers a robust solution by pinpointing and correcting biases in the classifier layer. Moreover, the use of a small split validation set to estimate bias parameters showcases an efficient mechanism to leverage limited exemplar data effectively.

Future Developments

The findings suggest multiple avenues for future research:

Scaling and Generalization: Exploring how the BiC method scales to broader applications beyond image classification, potentially in NLP or other data domains.
Enhanced Data Augmentation: Investigating enhanced data augmentation methods to further improve early incremental steps, as indicated by the results of EEIL.
Hyperparameter Optimization: Research into optimizing the train/validation split and examining additional bias correction models could yield even more robust results.

In conclusion, the BiC method offers a compelling approach to large-scale incremental learning, addressing critical challenges and indicating promising directions for future advancements in the field.

PDF Markdown

Related Papers

Find Related Papers