Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference (1810.11910v3)

Published 29 Oct 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.

Authors (7)

Matthew Riemer (32 papers)
Ignacio Cases (11 papers)
Robert Ajemian (3 papers)
Miao Liu (98 papers)
Irina Rish (85 papers)
Yuhai Tu (36 papers)
Gerald Tesauro (29 papers)

Citations (690)

View on Semantic Scholar

Summary

The paper introduces the Meta-Experience Replay (MER) algorithm that combines experience replay with meta-learning to optimize transfer and reduce interference.
MER outperforms methods like EWC and GEM in benchmarks, exhibiting enhanced accuracy retention even with limited memory buffers.
The approach effectively addresses catastrophic forgetting in non-stationary environments, paving the way for robust continual and reinforcement learning.

Learning to Learn without Forgetting: Balancing Transfer and Interference

The paper "Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference" addresses a significant challenge in the domain of continual learning: the persistent issue of catastrophic forgetting when neural networks are trained on non-stationary data distributions. The authors introduce a novel approach to reframe this problem as a convergence of two competing forces—transfer and interference—expressed through the alignment of gradients across examples.

Key Contributions

The paper introduces the Meta-Experience Replay (MER) algorithm, which combines experience replay with optimization-based meta-learning. The primary innovation lies in the utilization of historical data gradients to optimize learning in a manner that increases inter-task transfer likelihood while decreasing interference. The approach is task-agnostic, referring to task-related gradients from both partially learned and unlearned examples.

Numerical Analysis and Experimental Validation

MER demonstrates superior performance over existing continual learning methods, such as Elastic Weight Consolidation (EWC) and Gradient Episodic Memory (GEM), across multiple benchmarks in continual supervised learning and non-stationary reinforcement learning environments. Particularly noteworthy are the results showing:

Improvement in retained accuracy on MNIST Rotations and Permutations, outperforming existing methods.
The efficacy of MER in conditions with limited buffer sizes, indicating its robust scalability in memory-constrained setups.
Enhanced performance in highly non-stationary environments, like the Omniglot dataset, where it surpasses alternatives by a significant margin.
In reinforcement learning scenarios, such as non-stationary Catcher and Flappy Bird environments, MER effectively mitigates forgetting and facilitates transfer across tasks, maintaining stable performance.

Theoretical and Practical Implications

This work contributes a conceptual shift in tackling the issues inherent in continual learning by advocating for a temporally symmetric perspective, recognizing the need to balance stability and plasticity not only regarding past learning but extending it to future examples as well. The adoption of meta-learning principles enables the network to optimize parameter updates in a manner adaptable to dynamically changing environments.

Speculation on Future Developments

The implications of MER's approach are substantial, suggesting avenues for enhancing meta-learning frameworks with a focus on internalizing optimal weight-sharing dynamics. Future research can explore combining MER with routing networks or dual-memory architectures to further refine the handling of transfer and interference. Additionally, leveraging adaptive optimizers and exploring various neural architectures could further improve algorithm efficiency and efficacy.

Conclusion

In summary, this paper presents a substantive advancement in continual learning methodologies through the MER algorithm, effectively navigating the transfer-interference trade-off. The approach promises improved resilience to catastrophic forgetting while facilitating better performance in non-stationary settings, marking a significant step towards more robust and scalable neural learning models.

PDF Markdown