Papers
Topics
Authors
Recent
Search
2000 character limit reached

Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

Published 29 Sep 2023 in cs.LG | (2309.17207v6)

Abstract: Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark memory capabilities in decision-making agents. These environments, originally with finite tasks, are expanded into innovative, endless formats, mirroring the escalating challenges of cumulative memory games such as "I packed my bag". This progression in task design shifts the focus from merely assessing sample efficiency to also probing the levels of memory effectiveness in dynamic, prolonged scenarios. To address the gap in available memory-based Deep Reinforcement Learning baselines, we introduce an implementation within the open-source CleanRL library that integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This approach utilizes TrXL as a form of episodic memory, employing a sliding window technique. Our comparative study between the Gated Recurrent Unit (GRU) and TrXL reveals varied performances across our finite and endless tasks. TrXL, on the finite environments, demonstrates superior effectiveness over GRU, but only when utilizing an auxiliary loss to reconstruct observations. Notably, GRU makes a remarkable resurgence in all endless tasks, consistently outperforming TrXL by significant margins. Website and Source Code: https://marcometer.github.io/jmlr_2024.github.io/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Deep reinforcement learning at the edge of the statistical precipice. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
  2. Optuna: A next-generation hyperparameter optimization framework. In Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis, editors, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, pages 2623–2631. ACM, 2019. doi: 10.1145/3292500.3330701. URL https://doi.org/10.1145/3292500.3330701.
  3. Learning dexterous in-hand manipulation. Int. J. Robotics Res., 39(1), 2020. doi: 10.1177/0278364919887447. URL https://doi.org/10.1177/0278364919887447.
  4. What matters for on-policy deep actor-critic methods? a large-scale study. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=nIAxjsniDzg.
  5. Emergent tool use from multi-agent autocurricula. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=SkxpxJBKwS.
  6. Deepmind lab. CoRR, abs/1612.03801, 2016. URL http://arxiv.org/abs/1612.03801.
  7. Dota 2 with large scale deep reinforcement learning. CoRR, abs/1912.06680, 2019. URL http://arxiv.org/abs/1912.06680.
  8. Maxime Chevalier-Boisvert. Miniworld: Minimalistic 3d environment for rl & robotics research. https://github.com/maximecb/gym-miniworld, 2018.
  9. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
  10. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1724–1734. ACL, 2014. doi: 10.3115/v1/d14-1179. URL https://doi.org/10.3115/v1/d14-1179.
  11. Leveraging procedural generation to benchmark reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 2048–2056. PMLR, 2020. URL http://proceedings.mlr.press/v119/cobbe20a.html.
  12. Transformer-xl: Attentive language models beyond a fixed-length context. In Anna Korhonen, David R. Traum, and Lluís Màrquez, editors, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 2978–2988. Association for Computational Linguistics, 2019. doi: 10.18653/v1/p19-1285. URL https://doi.org/10.18653/v1/p19-1285.
  13. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, Feb 2022. ISSN 1476-4687. doi: 10.1038/s41586-021-04301-9. URL https://doi.org/10.1038/s41586-021-04301-9.
  14. Applications of the morris water maze in the study of learning and memory. Brain research reviews, 36(1):60–90, 2001.
  15. Implementation matters in deep RL: A case study on PPO and TRPO. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=r1etN1rtPB.
  16. IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1406–1415. PMLR, 2018. URL http://proceedings.mlr.press/v80/espeholt18a.html.
  17. Minedojo: Building open-ended embodied agents with internet-scale knowledge. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/74a67268c5cc5910f64938cac4526a90-Abstract-Datasets_and_Benchmarks.html.
  18. Generalization of reinforcement learners with working and episodic memory. In Advances in Neural Information Processing Systems, pages 12448–12457, 2019.
  19. Efficiently modeling long sequences with structured state spaces. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=uYLFoz1vlAC.
  20. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015, pages 29–37. AAAI Press, 2015. URL http://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11673.
  21. Memory-based control with recurrent neural networks. CoRR, abs/1512.04455, 2015. URL http://arxiv.org/abs/1512.04455.
  22. Grounded language learning fast and slow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=wpSWuz_hyqA.
  23. Long short-term memory. Neural Comput., 9(8):1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735.
  24. The 37 implementation details of proximal policy optimization. ICLR Blog Track, 2022a. URL https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/.
  25. The 37 implementation details of proximal policy optimization. In ICLR Blog Track, 2022b. URL https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/. Accessed: 2023-09-18.
  26. Improving transformer optimization through better initialization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 4475–4483. PMLR, 2020. URL http://proceedings.mlr.press/v119/huang20f.html.
  27. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. CoRR, abs/1807.01281, 2018. URL http://arxiv.org/abs/1807.01281.
  28. Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=r1lyTjAqYX.
  29. Towards mental time travel: a hierarchical memory for reinforcement learning agents. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=wfiVgITyCC_.
  30. Psychlab: A psychology laboratory for deep reinforcement learning agents. CoRR, abs/1801.08116, 2018. URL http://arxiv.org/abs/1801.08116.
  31. Structured state space models for in-context reinforcement learning. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023. URL https://openreview.net/forum?id=CKPTz21e6k.
  32. Memory-based deep reinforcement learning for pomdps. In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021, pages 5619–5626. IEEE, 2021. doi: 10.1109/IROS51168.2021.9636140. URL https://doi.org/10.1109/IROS51168.2021.9636140.
  33. Human-level control through deep reinforcement learning. Nat., 518(7540):529–533, 2015. doi: 10.1038/nature14236. URL https://doi.org/10.1038/nature14236.
  34. Asynchronous methods for deep reinforcement learning. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1928–1937. JMLR.org, 2016. URL http://proceedings.mlr.press/v48/mniha16.html.
  35. POPGym: Benchmarking partially observable reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=chDrutUTs0K.
  36. History compression via language models in reinforcement learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 17156–17185. PMLR, 2022a. URL https://proceedings.mlr.press/v162/paischer22a.html.
  37. Toward semantic history compression for reinforcement learning. In Second Workshop on Language and Reinforcement Learning, 2022b. URL https://openreview.net/forum?id=97C6klf5shp.
  38. Efficient transformers in reinforcement learning using actor-learner distillation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=uR9LaO_QxF.
  39. Stabilizing transformers for reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 7487–7498. PMLR, 2020. URL http://proceedings.mlr.press/v119/parisotto20a.html.
  40. Evaluating long-term memory in 3d mazes. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=yHLvIlE9RGN.
  41. Memory gym: Partially observable challenges to memory-based agents. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=jHc8dCx6DDr.
  42. Learning representations by back-propagating errors. Nature, 323(6088):533–536, 1986. ISSN 1476-4687. doi: 10.1038/323533a0. URL https://doi.org/10.1038/323533a0.
  43. High-dimensional continuous control using generalized advantage estimation. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1506.02438.
  44. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
  45. Reinforcement learning with latent flow. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 22171–22183, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/ba3c5fe1d6d6708b5bffaeb6942b7e04-Abstract.html.
  46. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  47. Grandmaster level in starcraft II using multi-agent reinforcement learning. Nat., 575(7782):350–354, 2019. doi: 10.1038/s41586-019-1724-z. URL https://doi.org/10.1038/s41586-019-1724-z.
  48. Enhanced POET: open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 9940–9951. PMLR, 2020. URL http://proceedings.mlr.press/v119/wang20l.html.
  49. Solving deep memory pomdps with recurrent policy gradients. In Joaquim Marques de Sá, Luís A. Alexandre, Wlodzislaw Duch, and Danilo P. Mandic, editors, Artificial Neural Networks - ICANN 2007, 17th International Conference, Porto, Portugal, September 9-13, 2007, Proceedings, Part I, volume 4668 of Lecture Notes in Computer Science, pages 697–706. Springer, 2007. doi: 10.1007/978-3-540-74690-4_71. URL https://doi.org/10.1007/978-3-540-74690-4_71.
  50. Vizdoom competitions: Playing doom from pixels. IEEE Transactions on Games, 2018.

Summary

  • The paper introduces Memory Gym, a suite of endless environments designed to benchmark agent memory effectiveness over long periods and cumulative tasks, moving beyond sample efficiency.
  • Empirical results show that GRU-based memory mechanisms surprisingly outperformed Transformer-XL in endless memory tasks, challenging assumptions about attention vs. recurrence for long-term memory.
  • The findings suggest current DRL benchmarks may not fully capture capabilities needed for applications requiring robust, long-term memory and point to the need for new evaluation metrics and architectural research.

Memory Gym: Evaluating the Memory Effectiveness of Agents

The paper "Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents" by Pleines, Pallasch, Zimmer, and Preuss introduces a set of environments designed to evaluate the memory effectiveness of decision-making agents. This work introduces a suite of three primary environments—Mortar Mayhem, Mystery Path, and Searing Spotlights—crafted to assess the agents' capabilities to retain and utilize memory over extended periods of engagement. The authors address a critical need for benchmarks that emphasize not merely sample efficiency but also the agents' ability to effectively use memory in dynamic scenarios.

Key Contributions

The study's primary contribution lies in the development of endless tasks that simulate cumulative memory games. These tasks provide an incremental increase in difficulty as the agent progresses, thus serving as an automatic curriculum. By doing so, they assess not only sample efficiency, which is traditionally measured in reinforcement learning, but also memory effectiveness over prolonged engagements. The environments exhibit a dynamic, continuous nature aimed at challenging an agent's memory retention and recall capabilities substantially beyond that of finite tasks.

To conduct these experiments, the authors extend the capabilities of existing Deep Reinforcement Learning (DRL) algorithms. Specifically, they introduce an open-source implementation that combines Transformer-XL (TrXL) with Proximal Policy Optimization (PPO). This novel combination leverages TrXL as a form of episodic memory with a sliding window technique, aiming to enhance the agent's memory utility in the decision-making process.

Observations and Findings

Empirical results showcase varied agent performance, depending on the environment and task configuration. Within finite environments, the TrXL variant exhibits superior sample efficiency in certain tasks but reveals limitations in others. In particular, TrXL displayed notable sample efficiency and effectiveness benefits in the "Mystery Path" environment. However, in "Searing Spotlights," a GRU-based memory mechanism outperformed the Transformer-based architecture in terms of sample efficiency.

Notably, a pivotal and unexpected finding was the GRU's consistently superior performance in endless tasks. Despite TrXL's architectural advantages in sample efficiency in finite settings, GRU mechanisms surpassed Transformer-XL's effectiveness in memory retention for extended tasks. This observation challenges the conventional reliance on attention mechanisms over recurrence, emphasizing the necessity to reevaluate memory architectures under continuous task stress.

Implications and Future Directions

The findings within endless tasks suggest that current benchmarks might not fully encapsulate the broader capabilities needed for application-based scenarios where memory effectiveness takes precedence over mere interaction efficiency. This points to potential recalibrations needed in how DRL environments are structured for comprehensive agent evaluation.

The research emphasizes avenues for future work, particularly in addressing the bottlenecks identified in transformer architectures. It speculates on the broader adoption of emerging sequences models, such as structured state space sequence models, and other novel architectures that might exhibit enhanced performance over endless tasks.

In practicality, this work underlines the necessity for researchers to develop more comprehensive evaluation metrics that factor in memory effectiveness. The open-source nature of the baseline implementation presented here provides a blueprint for further advancements in this area. This resource enables the community to embark on more extensive investigations into agent memory capacities under prolonged trials, which might impact fields requiring autonomous decision systems with robust memory demands.

Overall, the introduction of Memory Gym establishes a new ground for evaluating memory effectiveness in AI, prompting existing architecture paradigms to evolve towards sustainable and memory-intensive applications.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 33 likes about this paper.