Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

102 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Gradient Episodic Memory for Continual Learning (1706.08840v6)

Published 26 Jun 2017 in cs.LG and cs.AI

Abstract: One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.

PDF Abstract

Gradient Episodic Memory for Continual Learning

"Gradient Episodic Memory for Continual Learning" by David Lopez-Paz and Marc'Aurelio Ranzato addresses a significant challenge in AI: the ability of models to efficiently learn new tasks without forgetting previously acquired knowledge. The paper proposes a novel approach for continual learning, introducing both theoretical foundations and an innovative model called Gradient Episodic Memory (GEM).

Core Contributions

The authors identify three primary complications inherent to continual learning: non-iid data, catastrophic forgetting, and the opportunity for transfer learning between tasks. To address these issues systematically, they make the following contributions:

Metrics for Continual Learning: The paper proposes new metrics to evaluate continual learning models, namely Average Accuracy (ACC), Backward Transfer (BWT), and Forward Transfer (FWT). These metrics provide a comprehensive evaluation beyond mere test accuracy and capture the ability to transfer knowledge both across future and past tasks.
Gradient Episodic Memory (GEM): GEM is introduced as a model to alleviate catastrophic forgetting while facilitating beneficial knowledge transfer. GEM employs an episodic memory to store and leverage a subset of seen examples, updating model parameters in a constrained manner to prevent performance degradation on previous tasks.
Empirical Validation: The authors rigorously evaluate GEM against state-of-the-art methods including EWC and iCARL on several well-known benchmarks, specifically variations of MNIST and CIFAR-100 datasets. The empirical results indicate that GEM generally outperforms contemporary models, showing minimal forgetting and occasionally achieving positive backward transfer.

Detailed Overview

Problem Formalization

Continual learning is formalized as learning from a sequence of tasks, where each example is seen only once. The framework considers a continuum of data represented as $(x_i, t_i, y_i)$ , where $x_i$ is the input, $t_i$ the task descriptor, and $y_i$ the target. Importantly, examples from a task are not drawn iid from a fixed probability distribution, making classic ERM techniques impractical due to catastrophic forgetting.

The model's goal is to construct a predictor $f: X \times T \to Y$ capable of performing well on any given task, balancing the memory of past tasks and the acquisition of new knowledge.

Metrics

The suggested metrics for continual learning are described as follows:

Average Accuracy (ACC): Measures the model's mean performance across all tasks after the continuum has been processed.
Backward Transfer (BWT): Indicates the effect of learning new tasks on the performance of earlier tasks. Positive BWT suggests beneficial backward transfer, while negative BWT signals forgetting.
Forward Transfer (FWT): Captures the influence of learning a task $t$ on the performance of future tasks $k > t$ . Positive FWT implies that knowledge from task $t$ aids in learning future tasks more rapidly.

Gradient Episodic Memory (GEM)

GEM utilizes an episodic memory $\mathcal{M}_t$ , storing representative examples per task. During training, GEM formulates the learning objective as a constrained optimization problem. The constraints ensure that the loss on previous tasks does not increase, allowing for positive backward transfer. This is operationalized by projecting the gradient updates to respect these constraints.

Technically, GEM solves a Quadratic Program (QP) to project gradients, leveraging task gradients stored in memory. This method allows for efficient computation and effective mitigation of task forgetting.

Experimental Results

The performance of GEM was evaluated on the following datasets:

MNIST Permutations: Tasks involve randomly permuted pixels of the MNIST digits.
MNIST Rotations: Tasks involve digit recognition at various rotational angles.
Incremental CIFAR-100: Tasks progressively introduce new classes from the CIFAR-100 dataset.

GEM exhibited superior performance on these benchmarks:

In MNIST Rotations, GEM achieved high ACC (matching the upper-bound performance of the shuffled data scenario) while maintaining minimal or positive BWT.
On CIFAR-100, GEM outperformed iCARL across various memory sizes, highlighting its scalability and robustness against catastrophic forgetting.

Future Directions

The research opens several avenues for further investigation:

Structured Task Descriptors: Future work could explore the integration of rich, structured task descriptors for enhanced zero-shot learning capabilities.
Memory Management: Advanced strategies for memory management beyond simple last-in retention could be employed.
Computational Efficiency: Optimizing GEM for faster computation, particularly addressing the need for multiple backward passes per task.

Conclusion

The authors successfully bridge a crucial gap in AI by proposing GEM, a model that learns over a continuum of data with minimal forgetting and beneficial knowledge transfer. Their rigorous evaluation protocol and introduction of new metrics set a strong foundation for future research in continual learning, pushing towards more human-like AI systems that can learn efficiently over time without forgetting previously acquired knowledge.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

David Lopez-Paz (48 papers)
Marc'Aurelio Ranzato (53 papers)

Citations (2,439)

View on Semantic Scholar