Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control (2306.03530v4)

Published 6 Jun 2023 in cs.LG, cs.AI, and cs.RO

Abstract: Deep Reinforcement Learning (RL) can yield capable agents and control policies in several domains but is commonly plagued by prohibitively long training times. Additionally, in the case of continuous control problems, the applicability of learned policies on real-world embedded devices is limited due to the lack of real-time guarantees and portability of existing libraries. To address these challenges, we present RLtools, a dependency-free, header-only, pure C++ library for deep supervised and reinforcement learning. Its novel architecture allows RLtools to be used on a wide variety of platforms, from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers. Specifically, due to the tight integration of the RL algorithms with simulation environments, RLtools can solve popular RL problems up to 76 times faster than other popular RL frameworks. We also benchmark the inference on a diverse set of microcontrollers and show that in most cases our optimized implementation is by far the fastest. Finally, RLtools enables the first-ever demonstration of training a deep RL algorithm directly on a microcontroller, giving rise to the field of TinyRL. The source code as well as documentation and live demos are available through our project page at https://rl.tools.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Learn to Move Through a Combination of Policy Gradient Algorithms: DDPG, D4PG, and TD3. In Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Giorgio Jansen, Vincenzo Sciacca, Panos Pardalos, Giovanni Giuffrida, and Renato Umeton, editors, Machine Learning, Optimization, and Data Science, volume 12566, pages 631–644. Springer International Publishing, Cham, 2020. ISBN 978-3-030-64579-3 978-3-030-64580-9. doi: 10.1007/978-3-030-64580-9_52. URL http://link.springer.com/10.1007/978-3-030-64580-9_52. Series Title: Lecture Notes in Computer Science.
  2. Julia: A Fast Dynamic Language for Technical Computing, September 2012. URL http://arxiv.org/abs/1209.5145. arXiv:1209.5145 [cs].
  3. Torchrl: A data-driven decision-making library for pytorch. arXiv preprint arXiv:2306.00577, 2023.
  4. Mushroomrl: Simplifying reinforcement learning research. Journal of Machine Learning Research, 22(131):1–5, 2021. URL http://jmlr.org/papers/v22/18-056.html.
  5. Jonas Eschmann. Reward function design in reinforcement learning. Reinforcement Learning Algorithms: Analysis and Applications, pages 25–33, 2021.
  6. Addressing function approximation error in actor-critic methods. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1587–1596. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/fujimoto18a.html.
  7. Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research, 22(77):1–14, 2021. URL http://jmlr.org/papers/v22/20-376.html.
  8. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1861–1870. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/haarnoja18b.html.
  9. Soft Actor-Critic Algorithms and Applications, January 2019. URL http://arxiv.org/abs/1812.05905. arXiv:1812.05905 [cs, stat].
  10. Acme: A research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979, 2020. URL https://arxiv.org/abs/2006.00979.
  11. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022. URL http://jmlr.org/papers/v23/21-1342.html.
  12. Mike Innes. Flux: Elegant machine learning with Julia. Journal of Open Source Software, 3(25):602, May 2018. ISSN 2475-9066. doi: 10.21105/joss.00602. URL http://joss.theoj.org/papers/10.21105/joss.00602.
  13. RMA: Rapid Motor Adaptation for Legged Robots. In Proceedings of Robotics: Science and Systems, Virtual, July 2021. doi: 10.15607/RSS.2021.XVII.011.
  14. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5556–5566. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/kuznetsov20a.html.
  15. RLlib: Abstractions for distributed reinforcement learning. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3053–3062. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/liang18b.html.
  16. Continuous control with deep reinforcement learning, 2016. URL http://arxiv.org/abs/1509.02971. arXiv:1509.02971 [cs, stat].
  17. Px4: A node-based multithreaded open source robotics framework for deeply embedded platforms. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 6235–6240, 2015. doi: 10.1109/ICRA.2015.7140074.
  18. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  19. Trust region policy optimization. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1889–1897, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/schulman15.html.
  20. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  21. Proximal Policy Optimization Algorithms, August 2017. URL http://arxiv.org/abs/1707.06347. arXiv:1707.06347 [cs].
  22. skrl: Modular and flexible library for reinforcement learning. Journal of Machine Learning Research, 24(254):1–9, 2023.
  23. Jun Tian. Reinforcementlearning.jl: A reinforcement learning package for the julia programming language, 2020. URL https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl.
  24. acados—a modular open-source framework for fast embedded optimal control. Mathematical Programming Computation, 14(1):147–183, March 2022. ISSN 1867-2949, 1867-2957. doi: 10.1007/s12532-021-00208-8. URL https://link.springer.com/10.1007/s12532-021-00208-8.
  25. Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, 23(267):1–6, 2022. URL http://jmlr.org/papers/v23/21-1127.html.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com