Distilling Causal Effect of Data in Class-Incremental Learning (2103.01737v3)

Published 2 Mar 2021 in cs.AI

Abstract: We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation. We first 1) place CIL into the framework, 2) answer why the forgetting happens: the causal effect of the old data is lost in new training, and then 3) explain how the existing techniques mitigate it: they bring the causal effect back. Based on the framework, we find that although the feature/label distillation is storage-efficient, its causal effect is not coherent with the end-to-end feature learning merit, which is however preserved by data replay. To this end, we propose to distill the Colliding Effect between the old and the new data, which is fundamentally equivalent to the causal effect of data replay, but without any cost of replay storage. Thanks to the causal effect analysis, we can further capture the Incremental Momentum Effect of the data stream, removing which can help to retain the old effect overwhelmed by the new data effect, and thus alleviate the forgetting of the old class in testing. Extensive experiments on three CIL benchmarks: CIFAR-100, ImageNet-Sub&Full, show that the proposed causal effect distillation can improve various state-of-the-art CIL methods by a large margin (0.72%--9.06%).

Authors (5)

Xinting Hu (16 papers)
Kaihua Tang (13 papers)
Chunyan Miao (145 papers)
Xian-Sheng Hua (85 papers)
Hanwang Zhang (161 papers)

Citations (161)

View on Semantic Scholar

Summary

Causal Distillation in Class-Incremental Learning for Mitigating Catastrophic Forgetting

The paper "Distilling Causal Effect of Data in Class-Incremental Learning" introduces a new causal framework to address the well-documented issue of catastrophic forgetting in Class-Incremental Learning (CIL). This problem arises when a learning model tends to forget previously learned information upon learning new data, a notorious challenge in dynamic data environments that demands efficient management strategies to maintain long-term learning capabilities.

At the core of this paper, the authors propose novel insights through the lens of causal inference, aiming to better understand the causal mechanisms behind catastrophic forgetting and to employ these insights to enhance learning systems' capabilities. The framework and methodology proposed are orthogonal to existing techniques like data replay and feature/label distillation, potentially enriching the arsenal of strategies to combat forgetting in neural networks.

Key Contributions and Methodologies

Causal Framework for Understanding Forgetting: The paper places CIL within a causal inference framework to assess why forgetting occurs and how current methods might mitigate it. The authors identify that forgetting occurs because the causal impact of past data gets lost in new training cycles. Existing strategies try to recover this lost causal effect by emulating methods like data replay or distilling features and labels.
Distilling Colliding Effect: To address the inefficiencies in current methods, especially concerning storage, the authors propose distilling the "colliding effect" between new and old data as a means to emulate the causal effects of data replay without needing additional storage. The formulated method computes data causal effects by leveraging the collider property in causal graphs, ensuring data influence is preserved in an end-to-end learning process.
Incremental Momentum Effect Removal: This approach aims to counteract imbalances induced by sequential learning steps. The method helps stabilize model predictions across old and new classes by removing biased data causal effects from classifiers' outputs, achieved through a dynamic "head" direction adjusted after each learning increment.

Experimental Evaluations

The methodology was rigorously tested on CIFAR-100 and ImageNet, both cited frequently as benchmarks for CIL. Extensive experiments demonstrate that the proposed distillation significantly enhances the performance of various state-of-the-art CIL strategies, improving accuracy between 0.72% to 9.06% depending on the dataset and model setup. Importantly, the efficacy of this approach is especially pronounced when fewer old data samples are replayed.

Implications and Future Directions

The implications of this research are twofold: it provides a more efficient way (in terms of storage and computation) to retain old data influences without large-scale data retention, vital for large datasets like ImageNet. Furthermore, it opens new avenues for applying causal inference to other machine learning domains, especially where data replay is unfeasible.

Future research directions could explore further refinement of causal effect distillation methods to encapsulate more complex dependencies and larger data environments or adapt this reasoning to related areas of lifelong learning. The integration of causal inference principles into learning algorithms signifies a promising avenue to enhance robustness against forgetting, marking a substantial paradigm shift in dealing with one of continuous learning's central challenges.