Efficient Lifelong Learning with A-GEM (1812.00420v2)

Published 2 Dec 2018 in cs.LG and stat.ML

Abstract: In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.

Authors (4)

Arslan Chaudhry (15 papers)
Marc'Aurelio Ranzato (53 papers)
Marcus Rohrbach (75 papers)
Mohamed Elhoseiny (102 papers)

Citations (1,324)

View on Semantic Scholar

Summary

Efficient Lifelong Learning with A-GEM

In the domain of lifelong learning (LLL), a significant concern is how efficiently a learner can acquire new knowledge over a sequence of tasks while leveraging prior experiences. The paper 'Efficient Lifelong Learning with A-GEM' by Arslan Chaudhry et al. addresses this crucial problem. The authors introduce a novel algorithm, Averaged Gradient Episodic Memory (A-GEM), specifically designed to achieve efficient lifelong learning under strict constraints of computational and memory costs. This review summarizes the key contributions and implications of this work.

Motivations and Innovations

Lifelong learning differs from standard machine learning paradigms in that it must handle tasks sequentially, with the model being allowed to see each training sample only once. The challenge is to minimize 'catastrophic forgetting', where the model forgets previously learned tasks upon learning new ones. Many existing algorithms, though effective in reducing forgetting, do not scale well in terms of computational and memory efficiency or fail in single-pass learning scenarios due to multiple epochs over the same data.

Key contributions of the paper include:

A More Realistic Evaluation Protocol: The authors propose a new protocol that operates under single-pass constraints. This protocol emphasizes hyper-parameter selection on a small and disjoint set of tasks, ensuring that the evaluation remains realistic and the model's performance is not over-optimized to the training tasks.
Introduction of A-GEM: A-GEM builds upon and improves GEM by minimizing computational overhead. A-GEM ensures that the average episodic memory loss over previous tasks does not increase, using a refined gradient update rule that significantly reduces the complexity inherent to GEM.
Compositional Task Descriptors: These are leveraged to enhance the model's forward transfer capability, allowing the learner to utilize shared components across tasks and thereby speeding up the learning of new tasks.
New Metric for Learning Speed: The authors propose the Learning Curve Area (LCA) metric, which quantifies how quickly a model learns a new task, a critical measure in evaluating the efficiency of lifelong learning algorithms.

Experimental Results

The empirical evaluation is robust and extensive, covering a variety of benchmarks such as Permuted MNIST, Split CIFAR, Split CUB, and Split AWA. The results demonstrate that A-GEM achieves a superior balance between accuracy and computational efficiency. Specifically:

A-GEM maintains accuracy comparable to GEM while being approximately 100 times faster and 10 times less memory-intensive.
Regularization-based methods like EWC show limited effectiveness in the single-pass setting, highlighting the importance of A-GEM's approach.
The utility of compositional task descriptors is evident, as shown by improved accuracy and learning speed across all algorithms.

Implications

The implications of this work are both practical and theoretical:

Practical Implications: A-GEM facilitates deploying lifelong learning models in real-world applications where speed and limited memory are critical. This is particularly relevant in environments like edge computing and on-device AI systems where resources are constrained.
Theoretical Implications: By formulating a more realistic evaluation protocol and introducing LCA, the paper sets a new direction for future research in lifelong learning. It also underscores the importance of formulating constraints that mimic real-world conditions to ensure models' practical applicability.

Future Directions

While A-GEM marks a significant step towards efficient lifelong learning, there are several avenues for future research:

Enhanced Forward Transfer: Future work should aim at improving the zero-shot and few-shot learning capabilities further, possibly by exploring more nuanced task descriptors or transfer learning techniques.
Scalability: Investigating ways to scale A-GEM to handle a greater number of tasks and larger datasets would be an interesting direction.
Broader Applications: Extending the application of A-GEM to reinforcement learning and other domains where lifelong learning is beneficial could yield valuable insights.

In conclusion, the paper presents a cogent argument and a rigorous solution to the efficiency challenges in lifelong learning. A-GEM stands out as a significant contribution, promising enhanced accuracy and substantial computational savings, thus bridging a critical gap in the field.

PDF Markdown

Related Papers

Improved Schemes for Episodic Memory-based Lifelong Learning (2019)
Gradient Episodic Memory for Continual Learning (2017)
Continual Lifelong Learning with Neural Networks: A Review (2018)
GAN Memory with No Forgetting (2020)
Episodic Memory in Lifelong Language Learning (2019)