On Tiny Episodic Memories in Continual Learning (1902.10486v4)

Published 27 Feb 2019 in cs.LG and stat.ML

Abstract: In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen in the past is to store a small memory, dubbed episodic memory, that stores few examples from previous tasks and then to replay these examples when training for future tasks. In this work, we empirically analyze the effectiveness of a very small episodic memory in a CL setup where each training example is only seen once. Surprisingly, across four rather different supervised learning benchmarks adapted to CL, a very simple baseline, that jointly trains on both examples from the current task as well as examples stored in the episodic memory, significantly outperforms specifically designed CL approaches with and without episodic memory. Interestingly, we find that repetitive training on even tiny memories of past tasks does not harm generalization, on the contrary, it improves it, with gains between 7\% and 17\% when the memory is populated with a single example per class.

PDF Abstract

Analyzing Tiny Episodic Memories in Continual Learning

The paper "On Tiny Episodic Memories in Continual Learning" presents a comprehensive empirical paper of the efficacy of using small episodic memory in continual learning (CL) frameworks. This paper seeks to elucidate the role of such memory in mitigating catastrophic forgetting, a predominant issue in CL that refers to the inability of a model to retain knowledge of previously learned tasks upon exposure to new ones.

Key Contributions

Experience Replay (ER) Baseline: The paper establishes a simple yet robust baseline for CL—dubbed Experience Replay (ER)—which entails storing a very small episodic memory and leveraging it alongside current task examples for joint training. The salient feature of this approach is its ability to surpass existing state-of-the-art methods specifically designed for CL despite its simplicity and minimal computational cost.
Effectiveness of Tiny Memories: The paper demonstrates that even tiny memories, consisting of a single example per class, offer substantial improvements in model performance, yielding generalization gains between 7% and 17% across diverse benchmarks including Permuted MNIST, Split CIFAR, Split miniImageNet, and Split CUB. Notably, the authors find that repetitive training on these mini-memories enhances generalization rather than hampering it.
Comparison with Other Methods: The paper compares ER against regularization-based and memory-based CL methods, observing that ER's performance significantly exceeds that of methods like EWC and A-GEM, even when memory capacity is critically constrained. These observations hold across multiple datasets, enhancing the generalizability of the claims.
Analysis of Memory Usage: The paper thoroughly analyzes why ER with tiny episodic memories avoids overfitting and leads to better generalization. It posits that training on subsequent tasks induces a data-dependent regularization effect on past tasks, allowing for better generalization beyond the stored episodic examples. The authors also critique alternative memory-utilizing methods like A-GEM for their inefficiency in utilizing this regularization effect, potentially leading to underfitting.

Implications and Future Directions

This research posits that tiny episodic memories, when used effectively through ER, present a compelling solution to the perennial problem of catastrophic forgetting in CL. By adhering to a single-pass data protocol and leveraging data stored in memory strategically, the paper suggests that ER can achieve results akin to multi-task learning paradigms without necessitating the storage of entire datasets—a significant practical advantage.

Future work in this area should explore optimizing memory population strategies, particularly in resource-constrained settings. A hybrid memory strategy, involving both reservoir sampling and a structured memory framework like ring buffer, showed promising results in this paper. Scaling up these methodologies to understand their interplay with more complex, real-world datasets could yield further advancements in practical CL applications. Additionally, investigating the theoretical underpinnings of why data-dependent regularization operates effectively in such a setup might equip researchers with more profound insights into improving CL frameworks.

Overall, this paper makes significant strides in refining our understanding of episodic memory's role in CL, presenting a formidable case for Experience Replay as a standard baseline in future research endeavors.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Arslan Chaudhry (15 papers)
Marcus Rohrbach (75 papers)
Mohamed Elhoseiny (102 papers)
Thalaiyasingam Ajanthan (33 papers)
Puneet K. Dokania (44 papers)
Philip H. S. Torr (219 papers)
Marc'Aurelio Ranzato (53 papers)

Citations (355)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/Ar_Douillard/status/1844693850587054149

https://twitter.com/ts_terminal/status/1824117765461942702