Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning (2402.03046v1)

Published 5 Feb 2024 in cs.LG

Abstract: In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. At the time of writing, more than 25,000 runs have been tracked, for a cumulative duration of more than 8 years. Open RL Benchmark covers a wide range of RL libraries and reference implementations. Special care is taken to ensure that each experiment is precisely reproducible by providing not only the full parameters, but also the versions of the dependencies used to generate it. In addition, Open RL Benchmark comes with a command-line interface (CLI) for easy fetching and generating figures to present the results. In this document, we include two case studies to demonstrate the usefulness of Open RL Benchmark in practice. To the best of our knowledge, Open RL Benchmark is the first RL benchmark of its kind, and the authors hope that it will improve and facilitate the work of researchers in the field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. J. Achiam. Spinning Up in Deep Reinforcement Learning. https://github.com/openai/spinningup, 2018. URL https://github.com/openai/spinningup.
  2. Deep Reinforcement Learning at the Edge of the Statistical Precipice. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 29304–29320, 2021.
  3. Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/ba1c5356d9164bb64c446a4b690226b0-Abstract-Conference.html.
  4. MO-Gym: A Library of Multi-Objective Reinforcement Learning Environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022, 2022.
  5. What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. arXiv preprint arXiv:2006.05990, 2020.
  6. Agent57: Outperforming the Atari Human Benchmark. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 507–517. PMLR, 2020a. URL http://proceedings.mlr.press/v119/badia20a.html.
  7. Never Give Up: Learning Directed Exploration Strategies. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020b. URL https://openreview.net/forum?id=Sye57xStvB.
  8. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research, 47:253–279, 2013. doi: 10.1613/JAIR.3912. URL https://doi.org/10.1613/jair.3912.
  9. L. Biewald. Experiment Tracking with Weights and Biases, 2020. URL https://www.wandb.com/. Software available from wandb.com.
  10. TorchRL: A Data-Driven Decision-Making Library for Pytorch. arXiv preprint arXiv:2306.00577, 2023.
  11. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
  12. Exploration by random network distillation. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=H1lJJnR5Ym.
  13. Dopamine: A Research Framework for Deep Reinforcement Learning. arXiv preprint arXiv:1812.06110, 2018.
  14. Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=AY8zfZm0tDd.
  15. Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks. arXiv preprint arXiv:2306.13831, 2023.
  16. Leveraging Procedural Generation to Benchmark Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 2048–2056. PMLR, 2020. URL http://proceedings.mlr.press/v119/cobbe20a.html.
  17. Phasic Policy Gradient. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 2020–2027. PMLR, 2021. URL http://proceedings.mlr.press/v139/cobbe21a.html.
  18. E. Coumans and Y. Bai. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning. 2016.
  19. Implicit Quantile Networks for Distributional Reinforcement Learning. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1104–1113. PMLR, 2018. URL http://proceedings.mlr.press/v80/dabney18a.html.
  20. MushroomRL: Simplifying Reinforcement Learning Research. Journal of Machine Learning Research, 22(131):1–5, 2021. URL http://jmlr.org/papers/v22/18-056.html.
  21. OpenAI Baselines. https://github.com/openai/baselines, 2017. URL https://github.com/openai/baselines.
  22. First Return, Then Explore. Nature, 590(7847):580–586, 2021. doi: 10.1038/S41586-020-03157-9. URL https://doi.org/10.1038/s41586-020-03157-9.
  23. Implementation Matters in Deep RL: A Case Study on PPO and TRPO. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=r1etN1rtPB.
  24. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1406–1415. PMLR, 2018. URL http://proceedings.mlr.press/v80/espeholt18a.html.
  25. A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 3, NeurIPS Datasets and Benchmarks 2023, 2023. URL https://openreview.net/forum?id=jfwRLudQyj.
  26. Addressing Function Approximation Error in Actor-Critic Methods. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1582–1591. PMLR, 2018. URL http://proceedings.mlr.press/v80/fujimoto18a.html.
  27. ChainerRL: A Deep Reinforcement Learning Library. Journal of Machine Learning Research, 22(77):1–14, 2021. URL http://jmlr.org/papers/v22/20-376.html.
  28. panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning. 4th Robot Learning Workshop: Self-Supervised and Lifelong Learning at NeurIPS, 2021.
  29. T. garage contributors. Garage: A toolkit for reproducible reinforcement learning research. https://github.com/rlworkgroup/garage, 2019.
  30. TF-Agents: A library for Reinforcement Learning in TensorFlow. https://github.com/tensorflow/agents, 2018. URL https://github.com/tensorflow/agents.
  31. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1856–1865. PMLR, 2018. URL http://proceedings.mlr.press/v80/haarnoja18b.html.
  32. Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104, 2023.
  33. Deep Reinforcement Learning That Matters. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 3207–3214. AAAI Press, 2018. doi: 10.1609/AAAI.V32I1.11694. URL https://doi.org/10.1609/aaai.v32i1.11694.
  34. Rainbow: Combining Improvements in Deep Reinforcement Learning. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 3215–3222. AAAI Press, 2018. doi: 10.1609/AAAI.V32I1.11796. URL https://doi.org/10.1609/aaai.v32i1.11796.
  35. Acme: A Research Framework for Distributed Reinforcement Learning. arXiv preprint arXiv:2006.00979, 2020.
  36. The 37 Implementation Details of Proximal Policy Optimization. In ICLR Blog Track, 2022a. URL https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/. https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/.
  37. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022b. URL http://jmlr.org/papers/v23/21-1342.html.
  38. Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform, 2023.
  39. When to Trust Your Model: Model-Based Policy Optimization. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 12498–12509, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/5faf461eff3099671ad63c6f3f094f7f-Abstract.html.
  40. I. Kostrikov. JAXRL: Implementations of Reinforcement Learning algorithms in JAX. https://github.com/ikostrikov/jaxrl, Oct 2021. URL https://github.com/ikostrikov/jaxrl.
  41. Tensorforce: a TensorFlow library for applied reinforcement learning. https://github.com/tensorforce/tensorforce, 2017. URL https://github.com/tensorforce/tensorforce.
  42. TorchBeast: A PyTorch Platform for Distributed RL. arXiv preprint arXiv:1910.03552, 2019.
  43. Multi-Game Decision Transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/b2cac94f82928a85055987d9fd44753f-Abstract-Conference.html.
  44. E. Leurent. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env, 2018. URL https://github.com/eleurent/highway-env.
  45. RLlib: Abstractions for Distributed Reinforcement Learning. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 3059–3068. PMLR, 2018. URL http://proceedings.mlr.press/v80/liang18b.html.
  46. A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots. In L. P. Kaelbling, D. Kragic, and K. Sugiura, editors, 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 of Proceedings of Machine Learning Research, pages 466–489. PMLR, 2019. URL http://proceedings.mlr.press/v100/lynnerup20a.html.
  47. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents. Journal of Artificial Intelligence Research, 61:523–562, 2018. doi: 10.1613/JAIR.5699. URL https://doi.org/10.1613/jair.5699.
  48. D. Makoviichuk and V. Makoviychuk. rl-games: A High-performance Framework for Reinforcement Learning. https://github.com/Denys88/rl_games, May 2021. URL https://github.com/Denys88/rl_games.
  49. moolib: A Platform for Distributed RL. GitHub repository, 2022. URL https://github.com/facebookresearch/moolib.
  50. Playing Atari with Deep Reinforcement Learning. arXiv preprint arXiv:1312.5602, 2013.
  51. Asynchronous Methods for Deep Reinforcement Learning. In M. Balcan and K. Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1928–1937. JMLR.org, 2016. URL http://proceedings.mlr.press/v48/mniha16.html.
  52. Empirical Design in Reinforcement Learning. arXiv preprint arXiv:2304.01315, 2023.
  53. Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 7652–7662. PMLR, 2020. URL http://proceedings.mlr.press/v119/petrenko20a.html.
  54. Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program). Journal of Machine Learning Research, 22:164:1–164:20, 2021. URL http://jmlr.org/papers/v22/20-303.html.
  55. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research. arXiv preprint arXiv:1802.09464, 2018.
  56. A. Raffin. RL Baselines3 Zoo. https://github.com/DLR-RM/rl-baselines3-zoo, 2020.
  57. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
  58. A Generalist Agent. Transactions on Machine Learning Research, 2022, 2022. URL https://openreview.net/forum?id=1ikK0kHjvj.
  59. Trust Region Policy Optimization. In F. R. Bach and D. M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 1889–1897. JMLR.org, 2015. URL http://proceedings.mlr.press/v37/schulman15.html.
  60. High-Dimensional Continuous Control Using Generalized Advantage Estimation. In Y. Bengio and Y. LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1506.02438.
  61. Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347, 2017.
  62. MuJoCo: A Physics Engine for Model-Based Control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, pages 5026–5033. IEEE, 2012.
  63. Is Deep Reinforcement Learning Really Superhuman on Atari? arXiv preprint arXiv:1908.04683, 2019.
  64. Gymnasium, Mar. 2023. URL https://zenodo.org/record/8127025.
  65. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. Journal of Machine Learning Research, 23(267):1–6, 2022a. URL http://jmlr.org/papers/v23/21-1127.html.
  66. EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2, NeurIPS Datasets and Benchmarks 2022, 2022b. URL http://papers.nips.cc/paper_files/paper/2022/hash/8caaf08e49ddbad6694fae067442ee21-Abstract-Datasets_and_Benchmarks.html.
  67. Y. Zhao. abcdRL: Modular Single-file Reinforcement Learning Algorithms Library. https://github.com/sdpkjc/abcdrl, Dec. 2022. URL https://github.com/sdpkjc/abcdrl.
Citations (7)

Summary

  • The paper introduces a benchmark that standardizes reproducible RL experiments across various libraries and environments.
  • The paper details a command-line interface (CLI) that simplifies data extraction, visualization, and rigorous comparisons among RL methods.
  • The paper enhances research transparency by providing complete replication instructions, including hyperparameters and dependency details.

Introduction to Open RL Benchmark

In the pursuit of advancing Reinforcement Learning (RL), researchers require reliable benchmarks to evaluate new algorithmic approaches against established baselines. The lack of comprehensive, accessible, and reproducibly tracked experiments has been a substantial barrier in the field. Open RL Benchmark emerges as a solution, offering an extensive dataset of tracked RL experiments that encompasses various libraries, environments, and metrics. By providing fully documented and replicable experiment settings, this benchmark facilitates the comparison of RL methods and supports the efficient exploration of new ideas in RL research.

Insights into Reproducibility and Data Accessibility

Reproducibility lies at the core of scientific progress. In RL, the lack of detailed documentation, evolving software dependencies, and the idiosyncrasies of implementation can substantially influence experimental results and their reproducibility. Open RL Benchmark addresses these challenges head-on by offering exact experiment replication instructions, including all hyperparameters and dependencies. This initiative furthers the research community's ability to engage with intricate learning phenomena and exceptional events that might otherwise be neglected in result summaries.

The Open RL Benchmark CLI and Its Applications

A remarkable component of Open RL Benchmark is its powerful command-line interface (CLI), designed to ease the extraction, analysis, and visualization of data. The CLI is a one-stop shop that allows researchers to produce data visualizations and figures tailored for research publication with simplicity and precision. Notably, every figure included in the benchmark's related document has been generated through this CLI, demonstrating its utility.

Transformative Impact of Open RL Benchmark on RL Research

The introduction of the Open RL Benchmark has laid the groundwork for more standardized, transparent, and accessible RL research. It enables researchers to build upon existing datasets rather than expending resources on baseline reproductions, offering an unprecedented level of detail and clarity. However, as the benchmark grows with community contributions, the challenge remains to maintain user friendliness and manage the scale of engagement.

Conclusion

Open RL Benchmark stands as a significant stride toward resolving long-standing reproducibility and evaluation challenges in RL research. It democratizes the access to rich datasets, ensuring reliable comparisons and fostering a more profound comprehension of algorithms' performance dynamics. Despite potential difficulties in scaling and maintaining the efficiency of user collaboration, Open RL Benchmark's contributions offer a leap forward in setting higher standards for RL research.