Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GUARD: A Safe Reinforcement Learning Benchmark (2305.13681v4)

Published 23 May 2023 in cs.LG, cs.AI, and cs.RO

Abstract: Due to the trial-and-error nature, it is typically challenging to apply RL algorithms to safety-critical real-world applications, such as autonomous driving, human-robot interaction, robot manipulation, etc, where such errors are not tolerable. Recently, safe RL (i.e. constrained RL) has emerged rapidly in the literature, in which the agents explore the environment while satisfying constraints. Due to the diversity of algorithms and tasks, it remains difficult to compare existing safe RL algorithms. To fill that gap, we introduce GUARD, a Generalized Unified SAfe Reinforcement Learning Development Benchmark. GUARD has several advantages compared to existing benchmarks. First, GUARD is a generalized benchmark with a wide variety of RL agents, tasks, and safety constraint specifications. Second, GUARD comprehensively covers state-of-the-art safe RL algorithms with self-contained implementations. Third, GUARD is highly customizable in tasks and algorithms. We present a comparison of state-of-the-art safe RL algorithms in various task settings using GUARD and establish baselines that future work can build on.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  2. Approximation gradient error variance reduced optimization. In Workshop on Reinforcement Learning in Games (RLG) at The Thirty-Third AAAI Conference on Artificial Intelligence, 2019a.
  3. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
  4. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
  5. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/, 2019.
  6. Stochastic variance reduction for deep q-learning. arXiv preprint arXiv:1905.08152, 2019b.
  7. Data-efficient deep reinforcement learning for dexterous manipulation. arXiv preprint arXiv:1704.03073, 2017.
  8. Provably safe tolerance estimation for robot arms via sum-of-squares programming. IEEE Control Systems Letters, 6:3439–3444, 2022a.
  9. Robust and context-aware real-time collaborative robot handling via dynamic gesture commands. IEEE Robotics and Automation Letters, 2023.
  10. Solving the rubik’s cube with deep reinforcement learning and search. Nature Machine Intelligence, pages 1–8, 2019.
  11. Learning from physical human feedback: An object-centric one-shot adaptation method. arXiv preprint arXiv:2203.04951, 2022.
  12. Contact-rich trajectory generation in confined environments using iterative convex optimization. In Dynamic Systems and Control Conference, volume 84287, page V002T31A002. American Society of Mechanical Engineers, 2020a.
  13. Safe adaptation with multiplicative uncertainties using robust safe set algorithm. IFAC-PapersOnLine, 54(20):360–365, 2021.
  14. Safe reinforcement learning on autonomous vehicles. arXiv preprint arXiv:1910.00399, 2019.
  15. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2022.
  16. Constrained reinforcement learning for vehicle motion planning with topological reachability analysis. Robotics, 11(4), 2022a.
  17. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  18. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5(1):411–444, 2022.
  19. Safety index synthesis via sum-of-squares programming. arXiv preprint arXiv:2209.09134, 2022b.
  20. Experimental evaluation of human motion prediction toward safe and efficient human robot collaboration. In 2020 American Control Conference (ACC), pages 4349–4354. IEEE, 2020b.
  21. Hybrid task constrained planner for robot manipulator in confined environment. arXiv preprint arXiv:2304.09260, 2023.
  22. Human motion prediction using semi-adaptable neural networks. In 2019 American Control Conference (ACC), pages 4884–4890. IEEE, 2019.
  23. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
  24. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022b.
  25. State-wise safe reinforcement learning: A survey. The 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023.
  26. Probabilistic safeguard for reinforcement learning using safety index guided gaussian process models. arXiv preprint arXiv:2210.01041, 2022c.
  27. A hierarchical long short term safety framework for efficient robot manipulation under uncertainty. Robotics and Computer-Integrated Manufacturing, 82:102522, 2023a.
  28. Persistently feasible robust safe control by safety index synthesis and convex semi-infinite programming. IEEE Control Systems Letters, 2022.
  29. Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
  30. Autocost: Evolving intrinsic cost for zero-violation reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2023b.
  31. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pages 1329–1338. PMLR, 2016.
  32. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  33. Benjamin Ellenberger. Pybullet gymperium. https://github.com/benelot/pybullet-gym, 2018–2019.
  34. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2019.
  35. Behaviour suite for reinforcement learning. In International Conference on Learning Representations, 2020.
  36. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
  37. An empirical investigation of the challenges of real-world reinforcement learning. 2020.
  38. Saferl-kit: Evaluating efficient reinforcement learning methods for safe autonomous driving. arXiv preprint arXiv:2206.08528, 2022a.
  39. Benchmarking safe exploration in deep reinforcement learning. CoRR, abs/1910.01708, 2019.
  40. Joshua Achiam. Spinning Up in Deep Reinforcement Learning. 2018.
  41. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017a.
  42. Sven Gronauer. Bullet-safety-gym: A framework for constrained reinforcement learning. Technical report, TUM Department of Electrical and Computer Engineering, Jan 2022.
  43. Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, 23(267):1–6, 2022. URL http://jmlr.org/papers/v23/21-1127.html.
  44. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  45. RLlib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning (ICML), 2018.
  46. Openai baselines. https://github.com/openai/baselines, 2017.
  47. Stable baselines. https://github.com/hill-a/stable-baselines, 2018.
  48. Dopamine: A Research Framework for Deep Reinforcement Learning. 2018. URL http://arxiv.org/abs/1812.06110.
  49. Acme: A research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979, 2020. URL https://arxiv.org/abs/2006.00979.
  50. Matthias Plappert. keras-rl. https://github.com/keras-rl/keras-rl, 2016.
  51. Safe-control-gym: A unified benchmark suite for safe learning-based control and reinforcement learning in robotics. IEEE Robotics and Automation Letters, 7(4):11142–11149, 2022. doi: 10.1109/LRA.2022.3196132.
  52. Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623, 2019.
  53. Feasible actor-critic: Constrained reinforcement learning for ensuring statewise safety. arXiv preprint arXiv:2105.10682, 2021.
  54. Ipo: Interior-point policy optimization under constraints. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4940–4947, 2020.
  55. Projection-based constrained policy optimization. arXiv preprint arXiv:2010.03152, 2020.
  56. Safe exploration in continuous action spaces. CoRR, abs/1801.08757, 2018.
  57. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  58. Constrained policy optimization. In International Conference on Machine Learning, pages 22–31. PMLR, 2017b.
  59. Convex optimization. Cambridge university press, 2004.
  60. Evaluating model-free reinforcement learning toward safety-critical tasks. arXiv preprint arXiv:2212.05727, 2022b.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com