Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning (2402.03046v1)
Abstract: In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. At the time of writing, more than 25,000 runs have been tracked, for a cumulative duration of more than 8 years. Open RL Benchmark covers a wide range of RL libraries and reference implementations. Special care is taken to ensure that each experiment is precisely reproducible by providing not only the full parameters, but also the versions of the dependencies used to generate it. In addition, Open RL Benchmark comes with a command-line interface (CLI) for easy fetching and generating figures to present the results. In this document, we include two case studies to demonstrate the usefulness of Open RL Benchmark in practice. To the best of our knowledge, Open RL Benchmark is the first RL benchmark of its kind, and the authors hope that it will improve and facilitate the work of researchers in the field.
- J. Achiam. Spinning Up in Deep Reinforcement Learning. https://github.com/openai/spinningup, 2018. URL https://github.com/openai/spinningup.
- Deep Reinforcement Learning at the Edge of the Statistical Precipice. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 29304–29320, 2021.
- Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/ba1c5356d9164bb64c446a4b690226b0-Abstract-Conference.html.
- MO-Gym: A Library of Multi-Objective Reinforcement Learning Environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022, 2022.
- What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. arXiv preprint arXiv:2006.05990, 2020.
- Agent57: Outperforming the Atari Human Benchmark. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 507–517. PMLR, 2020a. URL http://proceedings.mlr.press/v119/badia20a.html.
- Never Give Up: Learning Directed Exploration Strategies. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020b. URL https://openreview.net/forum?id=Sye57xStvB.
- The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research, 47:253–279, 2013. doi: 10.1613/JAIR.3912. URL https://doi.org/10.1613/jair.3912.
- L. Biewald. Experiment Tracking with Weights and Biases, 2020. URL https://www.wandb.com/. Software available from wandb.com.
- TorchRL: A Data-Driven Decision-Making Library for Pytorch. arXiv preprint arXiv:2306.00577, 2023.
- OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
- Exploration by random network distillation. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=H1lJJnR5Ym.
- Dopamine: A Research Framework for Deep Reinforcement Learning. arXiv preprint arXiv:1812.06110, 2018.
- Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=AY8zfZm0tDd.
- Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks. arXiv preprint arXiv:2306.13831, 2023.
- Leveraging Procedural Generation to Benchmark Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 2048–2056. PMLR, 2020. URL http://proceedings.mlr.press/v119/cobbe20a.html.
- Phasic Policy Gradient. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 2020–2027. PMLR, 2021. URL http://proceedings.mlr.press/v139/cobbe21a.html.
- E. Coumans and Y. Bai. PyBullet, a Python Module for Physics Simulation for Games, Robotics and Machine Learning. 2016.
- Implicit Quantile Networks for Distributional Reinforcement Learning. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1104–1113. PMLR, 2018. URL http://proceedings.mlr.press/v80/dabney18a.html.
- MushroomRL: Simplifying Reinforcement Learning Research. Journal of Machine Learning Research, 22(131):1–5, 2021. URL http://jmlr.org/papers/v22/18-056.html.
- OpenAI Baselines. https://github.com/openai/baselines, 2017. URL https://github.com/openai/baselines.
- First Return, Then Explore. Nature, 590(7847):580–586, 2021. doi: 10.1038/S41586-020-03157-9. URL https://doi.org/10.1038/s41586-020-03157-9.
- Implementation Matters in Deep RL: A Case Study on PPO and TRPO. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=r1etN1rtPB.
- IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1406–1415. PMLR, 2018. URL http://proceedings.mlr.press/v80/espeholt18a.html.
- A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 3, NeurIPS Datasets and Benchmarks 2023, 2023. URL https://openreview.net/forum?id=jfwRLudQyj.
- Addressing Function Approximation Error in Actor-Critic Methods. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1582–1591. PMLR, 2018. URL http://proceedings.mlr.press/v80/fujimoto18a.html.
- ChainerRL: A Deep Reinforcement Learning Library. Journal of Machine Learning Research, 22(77):1–14, 2021. URL http://jmlr.org/papers/v22/20-376.html.
- panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning. 4th Robot Learning Workshop: Self-Supervised and Lifelong Learning at NeurIPS, 2021.
- T. garage contributors. Garage: A toolkit for reproducible reinforcement learning research. https://github.com/rlworkgroup/garage, 2019.
- TF-Agents: A library for Reinforcement Learning in TensorFlow. https://github.com/tensorflow/agents, 2018. URL https://github.com/tensorflow/agents.
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 1856–1865. PMLR, 2018. URL http://proceedings.mlr.press/v80/haarnoja18b.html.
- Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104, 2023.
- Deep Reinforcement Learning That Matters. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 3207–3214. AAAI Press, 2018. doi: 10.1609/AAAI.V32I1.11694. URL https://doi.org/10.1609/aaai.v32i1.11694.
- Rainbow: Combining Improvements in Deep Reinforcement Learning. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 3215–3222. AAAI Press, 2018. doi: 10.1609/AAAI.V32I1.11796. URL https://doi.org/10.1609/aaai.v32i1.11796.
- Acme: A Research Framework for Distributed Reinforcement Learning. arXiv preprint arXiv:2006.00979, 2020.
- The 37 Implementation Details of Proximal Policy Optimization. In ICLR Blog Track, 2022a. URL https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/. https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/.
- CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022b. URL http://jmlr.org/papers/v23/21-1342.html.
- Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform, 2023.
- When to Trust Your Model: Model-Based Policy Optimization. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 12498–12509, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/5faf461eff3099671ad63c6f3f094f7f-Abstract.html.
- I. Kostrikov. JAXRL: Implementations of Reinforcement Learning algorithms in JAX. https://github.com/ikostrikov/jaxrl, Oct 2021. URL https://github.com/ikostrikov/jaxrl.
- Tensorforce: a TensorFlow library for applied reinforcement learning. https://github.com/tensorforce/tensorforce, 2017. URL https://github.com/tensorforce/tensorforce.
- TorchBeast: A PyTorch Platform for Distributed RL. arXiv preprint arXiv:1910.03552, 2019.
- Multi-Game Decision Transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/b2cac94f82928a85055987d9fd44753f-Abstract-Conference.html.
- E. Leurent. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env, 2018. URL https://github.com/eleurent/highway-env.
- RLlib: Abstractions for Distributed Reinforcement Learning. In J. G. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 3059–3068. PMLR, 2018. URL http://proceedings.mlr.press/v80/liang18b.html.
- A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots. In L. P. Kaelbling, D. Kragic, and K. Sugiura, editors, 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 of Proceedings of Machine Learning Research, pages 466–489. PMLR, 2019. URL http://proceedings.mlr.press/v100/lynnerup20a.html.
- Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents. Journal of Artificial Intelligence Research, 61:523–562, 2018. doi: 10.1613/JAIR.5699. URL https://doi.org/10.1613/jair.5699.
- D. Makoviichuk and V. Makoviychuk. rl-games: A High-performance Framework for Reinforcement Learning. https://github.com/Denys88/rl_games, May 2021. URL https://github.com/Denys88/rl_games.
- moolib: A Platform for Distributed RL. GitHub repository, 2022. URL https://github.com/facebookresearch/moolib.
- Playing Atari with Deep Reinforcement Learning. arXiv preprint arXiv:1312.5602, 2013.
- Asynchronous Methods for Deep Reinforcement Learning. In M. Balcan and K. Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1928–1937. JMLR.org, 2016. URL http://proceedings.mlr.press/v48/mniha16.html.
- Empirical Design in Reinforcement Learning. arXiv preprint arXiv:2304.01315, 2023.
- Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 7652–7662. PMLR, 2020. URL http://proceedings.mlr.press/v119/petrenko20a.html.
- Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program). Journal of Machine Learning Research, 22:164:1–164:20, 2021. URL http://jmlr.org/papers/v22/20-303.html.
- Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research. arXiv preprint arXiv:1802.09464, 2018.
- A. Raffin. RL Baselines3 Zoo. https://github.com/DLR-RM/rl-baselines3-zoo, 2020.
- Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
- A Generalist Agent. Transactions on Machine Learning Research, 2022, 2022. URL https://openreview.net/forum?id=1ikK0kHjvj.
- Trust Region Policy Optimization. In F. R. Bach and D. M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 1889–1897. JMLR.org, 2015. URL http://proceedings.mlr.press/v37/schulman15.html.
- High-Dimensional Continuous Control Using Generalized Advantage Estimation. In Y. Bengio and Y. LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1506.02438.
- Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347, 2017.
- MuJoCo: A Physics Engine for Model-Based Control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, pages 5026–5033. IEEE, 2012.
- Is Deep Reinforcement Learning Really Superhuman on Atari? arXiv preprint arXiv:1908.04683, 2019.
- Gymnasium, Mar. 2023. URL https://zenodo.org/record/8127025.
- Tianshou: A Highly Modularized Deep Reinforcement Learning Library. Journal of Machine Learning Research, 23(267):1–6, 2022a. URL http://jmlr.org/papers/v23/21-1127.html.
- EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2, NeurIPS Datasets and Benchmarks 2022, 2022b. URL http://papers.nips.cc/paper_files/paper/2022/hash/8caaf08e49ddbad6694fae067442ee21-Abstract-Datasets_and_Benchmarks.html.
- Y. Zhao. abcdRL: Modular Single-file Reinforcement Learning Algorithms Library. https://github.com/sdpkjc/abcdrl, Dec. 2022. URL https://github.com/sdpkjc/abcdrl.