Papers
Topics
Authors
Recent
Search
2000 character limit reached

Augmenting Replay in World Models for Continual Reinforcement Learning

Published 30 Jan 2024 in cs.LG and cs.AI | (2401.16650v3)

Abstract: Continual RL requires an agent to learn new tasks without forgetting previous ones, while improving on both past and future tasks. The most common approaches use model-free algorithms and replay buffers can help to mitigate catastrophic forgetting, but often struggle with scalability due to large memory requirements. Biologically inspired replay suggests replay to a world model, aligning with model-based RL; as opposed to the common setting of replay in model-free algorithms. Model-based RL offers benefits for continual RL by leveraging knowledge of the environment, independent of policy. We introduce WMAR (World Models with Augmented Replay), a model-based RL algorithm with a memory-efficient distribution-matching replay buffer. WMAR extends the well known DreamerV3 algorithm, which employs a simple FIFO buffer and was not tested in continual RL. We evaluated WMAR and DreamerV3, with the same-size replay buffers. They were tested on two scenarios: tasks with shared structure using OpenAI Procgen and tasks without shared structure using the Atari benchmark. WMAR demonstrated favourable properties for continual RL considering metrics for forgetting as well as skill transfer on past and future tasks. Compared to DreamerV3, WMAR showed slight benefits in tasks with shared structure and substantially better forgetting characteristics on tasks without shared structure. Our results suggest that model-based RL with a memory-efficient replay buffer can be an effective approach to continual RL, justifying further research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020. _eprint: https://doi.org/10.1177/0278364919887447.
  2. DeepMind Lab, December 2016. arXiv:1612.03801 [cs].
  3. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013.
  4. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
  5. A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37(1):54–115, 1987.
  6. Leveraging Procedural Generation to Benchmark Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, pages 2048–2056. PMLR, November 2020. ISSN: 2640-3498.
  7. Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999. Publisher: Elsevier.
  8. World Models. arXiv preprint arXiv:1803.10122, March 2018. arXiv:1803.10122 [cs, stat].
  9. Learning Latent Dynamics for Planning from Pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2555–2565. PMLR, June 2019.
  10. Dream to Control: Learning Behaviors by Latent Imagination, March 2020. arXiv:1912.01603 [cs].
  11. Mastering Atari with Discrete World Models, February 2022. arXiv:2010.02193 [cs, stat].
  12. Mastering Diverse Domains through World Models, January 2023. arXiv:2301.04104 [cs, stat].
  13. Neuroscience-Inspired Artificial Intelligence. Neuron, 95(2):245–258, July 2017. Publisher: Elsevier.
  14. Selective Experience Replay for Lifelong Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), April 2018. Number: 1.
  15. The Surprising Effectiveness of Latent World Models for Continual Reinforcement Learning, November 2022. arXiv:2211.15944 [cs].
  16. Towards Continual Reinforcement Learning: A Review and Perspectives. Journal of Artificial Intelligence Research, 75:1401–1476, December 2022.
  17. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, March 2017. Publisher: Proceedings of the National Academy of Sciences.
  18. Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear, March 2018. arXiv:1611.01211 [cs, stat].
  19. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents. Journal of Artificial Intelligence Research, 61:523–562, March 2018.
  20. Mackenzie Weygandt Mathis. The neocortical column as a universal template for perception and world-model learning. Nature Reviews Neuroscience, 24(1):3–3, January 2023. Number: 1 Publisher: Nature Publishing Group.
  21. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. In Gordon H. Bower, editor, Psychology of Learning and Motivation, volume 24, pages 109–165. Academic Press, January 1989.
  22. Dota 2 with Large Scale Deep Reinforcement Learning, December 2019.
  23. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
  24. Mark Bishop Ring. Continual learning in reinforcement environments. PhD thesis, University of Texas at Austin, Texas 78712, 1994.
  25. Experience Replay for Continual Learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  26. Proximal Policy Optimization Algorithms, August 2017. arXiv:1707.06347 [cs].
  27. Progress & Compress: A scalable framework for continual learning. In Proceedings of the 35th International Conference on Machine Learning, pages 4528–4537. PMLR, July 2018. ISSN: 2640-3498.
  28. Continual Learning with Deep Generative Replay. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  29. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, November 2019. Number: 7782 Publisher: Nature Publishing Group.
  30. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992. Publisher: Springer.

Summary

  • The paper introduces WMAR which integrates a world model with an augmented replay buffer, dramatically reducing memory overhead and mitigating catastrophic forgetting.
  • The method effectively improves forward transfer in tasks with shared structure while balancing performance retention in dissimilar tasks.
  • Experimental results on OpenAI ProcGen and Atari benchmarks demonstrate WMAR’s stability and transfer efficiency compared to FIFO-only replay models.

Augmenting Replay in World Models for Continual Reinforcement Learning

Introduction to Continual Reinforcement Learning (CRL)

Continual reinforcement learning (CRL) is a challenging aspect of machine learning where an agent encounters a sequence of tasks, aiming to learn new tasks without losing acquired knowledge and enhance its performance on both previous and future tasks. Traditional reinforcement learning (RL) usually involves model-free approaches heavily reliant on large replay buffers to mitigate catastrophic forgetting. However, these approaches become inefficient due to the vast storage requirements. An alternative involves leveraging model-based RL using world models, aligning with the biological inspiration of replay mechanisms observed in mammalian brains. This work introduces World Models with Augmented Replay (WMAR) to address continual RL through an efficient memory system.

World Models with Augmented Replay (WMAR) Framework

WMAR integrates DreamerV3's world model framework with an augmented replay buffer, facilitating efficient memory usage in continual RL. The inherent architecture comprises a recurrent state-space model that simulates the environment's dynamics, reducing the reliance on excessive replay storage. A significant contribution of WMAR lies in the introduction of a memory-efficient distribution matching replay buffer alongside a short-term FIFO buffer for comprehensive experience training.

WMAR addresses two primary task settings:

  • Tasks with Shared Structure: Leveraging a commonality between tasks allows sharing of learned knowledge across similar tasks.
  • Tasks Without Shared Structure: Highlights the challenge of retaining performance in dissimilar task sets. Figure 1

Figure 1

Figure 1: Continual learning performance on tasks without shared structure. Bold line segments denote periods in which certain tasks are trained. Scores are normalized by Equation~\ref{eq:normalization}.

Experimental Evaluation

The experimental framework evaluates WMAR against its FIFO-only variant across tasks with and without shared structure using benchmarks from OpenAI ProcGen and Atari environments. The primary metrics for evaluation include stability, backward/forward transfer, and forgetting.

The key observations from the experiments are as follows:

  • Memory Efficiency: WMAR significantly reduces computational overhead compared to its FIFO-only counterpart while demonstrating improved performance retention in both shared and non-shared task settings.
  • Forgetting and Transfer: WMAR showcases improved mitigation of catastrophic forgetting, especially in dissimilar task sets. However, this comes at the cost of reduced plasticity, impacting learning efficiency in new tasks.
  • Performance Comparison: On tasks with shared structure, WMAR exhibits positive forward transfer, substantially enhancing learning speed and performance consistency across tasks. Figure 2

    Figure 2: Continual learning performance on tasks with shared structure. Bold line segments denote the periods in which certain tasks are trained. Scores are normalized by Equation~\ref{eq:normalization}.

Implications and Future Outlook

WMAR demonstrates the potential of leveraging model-based RL with augmented replay buffers to address the continual learning problem efficiently. The reduction in memory usage without compromising performance emphasizes the viability of world model approaches for real-world continual RL applications. However, there is a need for further exploration into hyperparameter tuning and combining WMAR with other established techniques like behaviour cloning to address plasticity drawbacks.

In future research, it will be crucial to explore the intricacies of tuning world models to accommodate task variations effectively, particularly addressing the trade-off between stability and plasticity. Moreover, exploring the intersection of supervised tasks with continual learning paradigms may provide significant insights into the broader applicability of WMAR in diverse domains.

Conclusion

The introduction of WMAR marks a notable advancement in continual RL, illustrating the benefits of memory-efficient world models. The exploration of tasks with and without shared structures elucidates the complex dynamics of continual learning in model-based RL. The research evidences substantial progress but highlights avenues for further investigation into efficient model-based solutions for evolving task environments in reinforcement learning. The potential integration of WMAR with complementary RL strategies promises an exciting direction for enhancing agent adaptability and learning robustness.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.