Gradient Projection Memory for Continual Learning (2103.09762v1)

Published 17 Mar 2021 in cs.LG and cs.CV

Abstract: The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks. We find the bases of these subspaces by analyzing network representations (activations) after learning each task with Singular Value Decomposition (SVD) in a single shot manner and store them in the memory as Gradient Projection Memory (GPM). With qualitative and quantitative analyses, we show that such orthogonal gradient descent induces minimum to no interference with the past tasks, thereby mitigates forgetting. We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches.

Citations (230)

View on Semantic Scholar

Summary

The paper introduces Gradient Projection Memory (GPM), a technique that mitigates catastrophic forgetting by projecting gradients in orthogonal directions.
It leverages Singular Value Decomposition to separate gradient space into core and residual subspaces, balancing stability with adaptability.
Empirical results on datasets like CIFAR-100 and miniImageNet demonstrate GPM’s competitive accuracy and improved memory efficiency over state-of-the-art methods.

Gradient Projection Memory for Continual Learning: An Analytical Overview

The paper "Gradient Projection Memory for Continual Learning" presents a sophisticated methodology to tackle the notorious challenge of catastrophic forgetting in artificial neural networks (ANNs) during sequential task learning. This issue arises when ANNs, upon learning new tasks, fail to retain knowledge of previously learned tasks, primarily due to unrestricted gradient updates that modify the network's parametric representation.

Methodological Innovation

The crux of this research is the novel introduction of Gradient Projection Memory (GPM), which strategically constrains gradient updates to preserve past task knowledge while accommodating new information. Unlike conventional methods which may necessitate network expansion, importance-based weight updating, or rehearsal techniques, this approach utilizes a minimalist yet effective technique focusing on the orthogonality of gradient directions.

The paper suggests that by projecting new task gradients onto the orthogonal subspaces of previously significant gradient directions, one can reduce interference, thus mitigating the forgetting of previous tasks. The orthogonal gradient steps are computed by analyzing the representation (activations) of the network post each task using Singular Value Decomposition (SVD). This results in defining two orthogonal subspaces for gradients: Core Gradient Space (CGS) and Residual Gradient Space (RGS). The former captures directions crucial to past tasks, while the latter invites updates for new tasks without interference.

Empirical Validation and Results

The researchers evaluated their method on diverse datasets, including miniImageNet, CIFAR-100, and PMNIST, deploying a variety of architectures such as ResNet and AlexNet. The performance of GPM was either superior or comparable to state-of-the-art continual learning strategies, demonstrating robustness across both short and long sequences of tasks. For instance, GPM achieved notable classification accuracy with minimal interference when benchmarked against methods like EWC, HAT, and orthogonal approaches such as OGD.

Numerical results underscore that GPM not only enhances the classification accuracy but also effectively curtails forgetting, as evidenced by backward transfer metrics. Particularly in a challenging class-incremental learning setting, the method exhibited significant promise with improved memory efficiency and computational speed over established continual learning paradigms.

Implications and Future Directions

The theoretical implications of GPM extend to redefining the approach to gradient-based learning—emphasizing strategic subspace utilization to balance the inherent stability-plasticity trade-off. Practically, GPM challenges current paradigms by obviating the need for network expansion or data storage while maintaining task performance. It stands as an efficient, scalable solution adaptable across different ANN architectures and domain-specific applications where data privacy is paramount.

Future research could investigate hybrid models that incorporate GPM with limited data replay strategies to further enhance performance, especially in class-incremental scenarios where challenges persist. Additionally, exploring GPM's integration with emerging neural architectures could unlock further efficiencies, potentially setting new benchmarks in real-world continual learning deployments.

In summary, this paper offers a compelling proposition that positions Gradient Projection Memory as a foundational advancement in continual learning research, paving the way towards more resilient and memory-efficient neural networks.

PDF Markdown