Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Experience Replay for Continual Learning (1811.11682v2)

Published 28 Nov 2018 in cs.LG, cs.AI, and stat.ML
Experience Replay for Continual Learning

Abstract: Continual learning is the problem of learning new tasks or knowledge while protecting old knowledge and ideally generalizing from old experience to learn new tasks faster. Neural networks trained by stochastic gradient descent often degrade on old tasks when trained successively on new tasks with different data distributions. This phenomenon, referred to as catastrophic forgetting, is considered a major hurdle to learning with non-stationary data or sequences of new tasks, and prevents networks from continually accumulating knowledge and skills. We examine this issue in the context of reinforcement learning, in a setting where an agent is exposed to tasks in a sequence. Unlike most other work, we do not provide an explicit indication to the model of task boundaries, which is the most general circumstance for a learning agent exposed to continuous experience. While various methods to counteract catastrophic forgetting have recently been proposed, we explore a straightforward, general, and seemingly overlooked solution - that of using experience replay buffers for all past events - with a mixture of on- and off-policy learning, leveraging behavioral cloning. We show that this strategy can still learn new tasks quickly yet can substantially reduce catastrophic forgetting in both Atari and DMLab domains, even matching the performance of methods that require task identities. When buffer storage is constrained, we confirm that a simple mechanism for randomly discarding data allows a limited size buffer to perform almost as well as an unbounded one.

Experience Replay for Continual Learning

Continual learning remains a formidable challenge for artificial intelligence systems, especially in dynamic environments where tasks and associated data distributions evolve over time. Essential to effective continual learning are the dual objectives of plasticity and stability: the ability to acquire new knowledge while preserving existing knowledge. A recurrent issue in this domain is catastrophic forgetting, where new learning disrupts previously acquired knowledge. The paper "Experience Replay for Continual Learning" presents a method called Continual Learning with Experience And Replay (CLEAR) that addresses catastrophic forgetting within the field of reinforcement learning (RL).

The paper commences by identifying the stability-plasticity dilemma as a central obstacle in continual learning. Traditional RL methods often mitigate forgetting through extensive computational resources that allow for simultaneous task training, which is impractical in real-world scenarios where sequential learning is imperative. The authors highlight the inadequacy of conventional techniques such as Elastic Weight Consolidation (EWC) and Progressive Networks, which hinge on synaptic consolidation and require explicit knowledge of task boundaries.

CLEAR is introduced as a replay-based technique that integrates off-policy learning and behavioral cloning to sustain stability, with on-policy learning to maintain plasticity. The backbone of CLEAR includes three primary functions:

  • Off-Policy Learning: V-Trace algorithm adapts past experiences, correcting for distribution shifts.
  • Behavioral Cloning: Ensures minimized deviation between historical policy distributions and current policy distributions.
  • On-Policy Learning: Facilitates immediate adaption to new tasks through fresh experience replay.

The efficacy of CLEAR is empirically validated across several benchmarks. In a DMLab environment comprising three distinct tasks, the paper demonstrates that catastrophic forgetting is significantly mitigated when using CLEAR compared to both sequential and simultaneous training scenarios. Notably, CLEAR outperformed EWC and P{content}C – methods that inherently require prior knowledge about task boundaries – on several Atari tasks, showcasing its robustness and simplicity.

Key Numerical Results

  • CLEAR implementation resulted in a cumulative performance that closely parallels simultaneous task training, underscoring the minimal impact of catastrophic forgetting.
  • In the experimental setup with DMLab tasks, CLEAR achieved near-identical performance to isolated task training, substantially exceeding the performance of sequential task training without CLEAR.
  • The efficacy of CLEAR was validated across different buffer sizes, with even limited memory resulting in substantial prevention of catastrophic forgetting.

Implications and Future Directions

The implications of the CLEAR method are multifaceted. Practically, it offers a scalable and simplistic solution to continual learning in various real-world applications, negating the need for extensive computational overhead or prior task knowledge. Theoretically, it bridges a gap in reinforcement learning by providing a robust mechanism to mitigate forgetting, leveraging the innate stability offered by experience replay augmented by behavioral cloning.

Future research could explore integrated approaches combining CLEAR with parameter-protection methods such as EWC. Additionally, optimizing off-policy corrections within the CLEAR framework might further enhance plasticity without compromising stability. Experimentation in more diverse and complex environments, including those with dynamically shifting action spaces, can provide deeper insights into the adaptability and generalizability of CLEAR.

In conclusion, the CLEAR method offers a compelling blend of simplicity and effectiveness in preventing catastrophic forgetting within continual learning paradigms. By judiciously combining off-policy learning with behavioral cloning, CLEAR sets a new benchmark in reinforcement learning, promising wide-ranging applications in evolving, real-world environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. David Rolnick (68 papers)
  2. Arun Ahuja (24 papers)
  3. Jonathan Schwarz (12 papers)
  4. Timothy P. Lillicrap (19 papers)
  5. Greg Wayne (33 papers)
Citations (976)