Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting (2502.09500v2)

Published 13 Feb 2025 in cs.LG

Abstract: Catastrophic forgetting -- the phenomenon of a neural network learning a task t1 and losing the ability to perform it after being trained on some other task t2 -- is a long-standing problem for neural networks [McCloskey and Cohen, 1989]. We present a method, Eidetic Learning, that provably solves catastrophic forgetting. A network trained with Eidetic Learning -- here, an EideticNet -- requires no rehearsal or replay. We consider successive discrete tasks and show how at inference time an EideticNet automatically routes new instances without auxiliary task information. An EideticNet bears a family resemblance to the sparsely-gated Mixture-of-Experts layer Shazeer et al. [2016] in that network capacity is partitioned across tasks and the network itself performs data-conditional routing. An EideticNet is easy to implement and train, is efficient, and has time and space complexity linear in the number of parameters. The guarantee of our method holds for normalization layers of modern neural networks during both pre-training and fine-tuning. We show with a variety of network architectures and sets of tasks that EideticNets are immune to forgetting. While the practical benefits of EideticNets are substantial, we believe they can be benefit practitioners and theorists alike. The code for training EideticNets is available at https://github.com/amazon-science/eideticnet-training.

PDF Abstract

This paper introduces Eidetic Learning, a novel method to provably solve catastrophic forgetting in neural networks. The core idea is to exploit the overparameterization of neural networks by iteratively pruning unimportant neurons after training on each task, freezing the remaining important neurons, and then re-initializing the pruned neurons for subsequent tasks. This ensures that the network retains its ability to perform previously learned tasks while learning new ones.

Here's a detailed breakdown of the paper:

Problem Addressed: Catastrophic forgetting, the tendency of neural networks to lose previously acquired knowledge when trained on new tasks.

Proposed Solution: Eidetic Learning, which guarantees persistence (important neurons for a task remain unchanged) and resistance (unimportant neurons don't affect important ones) conditions, thereby preventing forgetting. This is achieved through iterative pruning, freezing, and recycling of neurons.

Key Concepts:

EideticNet: A neural network trained using Eidetic Learning.
Iterative Pruning: After training on a task, neurons are pruned (removed) until the training accuracy drops below a threshold.
Freezing: The unpruned neurons (deemed important) are frozen, meaning their weights are no longer updated during subsequent training.
Recycling: The pruned neurons are re-initialized and used for learning new tasks.
Nested Subnetworks: The architecture promotes nested subnetworks, allowing subsequent tasks to reuse features learned during previous tasks.
Task Classifier: A final classifier trained to identify the task ID of a given input, eliminating the need for explicit task information at inference time.

Methodology:

Training: A network is initially trained on a task.
Pruning: Iterative pruning identifies and removes unimportant neurons for the trained task. The paper mentions L1, L2 weight magnitude pruning, and Taylor pruning as possibilities.
Freezing: The weights of the remaining neurons are frozen.
Re-initialization: The pruned neurons are re-initialized.
Synaptic Connection Removal: Synaptic connections are directionally removed from pruned neurons to unpruned neurons in downstream layers.
Task Classification (Inference): A meta-task classifier is trained on all tasks to classify input data by task, then routes it to the appropriate classifier head.

Guarantees: Eidetic Learning guarantees that once a task is learned, performance on that task will not degrade when training on subsequent tasks.

Benefits Highlighted:

Efficiency: Linear time and memory complexity.
Robustness: Handles significantly different subsequent tasks without harming existing tasks.
Stability: Stable training due to persistent neuron importance.
Interpretability: Clear understanding of capacity usage per task.
Maintainability: Layers can be widened to add capacity.
No rehearsal or replay required

Implementation Details:

The paper describes how Eidetic Learning is applied to various layer types, including Linear, Convolutional, Batch Normalization, Residual Connections, and Recurrent Layers.
Specific attention is paid to Batch Normalization layers to ensure the preservation of task-specific statistics.
Residual connections are handled carefully to preserve the immutability guarantees of the network.

Experiments:

Permuted MNIST (PMNIST): Achieves competitive results compared to other continual learning methods.
Sequential CIFAR100: Evaluated with a deep ResNet50 architecture.
Sequential Imagenette: Demonstrates scalability to high-resolution images using ResNet18.
Task Routing Evaluation: A comparison between a "perfect" oracle task router versus the implemented method.

Comparison to Related Work: The paper positions Eidetic Learning with respect to other continual learning methods such as Continual Learning via Neural Pruning (CLNP), Piggyback, PackNet, Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI), and Mixture of Experts (MoE). It emphasizes its advantages in terms of hyperparameter complexity, the absence of task IDs at inference time, and integration with PyTorch's pruning library. The method's approach to representation space is highlighted as a distinction from other methods that also focus on classification space.

Limitations and Future Work:

The method, as presented, only supports forward transfer of knowledge but can be extended to backward transfer.
It currently assumes a task-incremental learning setting and does not address class-incremental learning.
Taylor pruning might not always prune uniformly across layers.
The need for sufficient excess capacity within the ANN is critical for Eidetic Learning to be effective.

Overall Significance: Eidetic Learning offers a provable and practical solution to catastrophic forgetting by strategically managing network capacity through pruning, freezing, and re-initialization. The approach is relatively simple to implement, computationally efficient, and readily integrates with modern neural network architectures.