Papers
Topics
Authors
Recent
2000 character limit reached

Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications (2402.15650v3)

Published 23 Feb 2024 in cs.LG and cs.AI

Abstract: Safe reinforcement learning tasks are a challenging domain despite being very common in the real world. The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. To address this issue, we first describe the problem with a stronger Uniformly Constrained MDP (UCMDP) model where we impose constraints on all reachable states; we then propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic, as a solution to the Lagrangian dual of a UCMDP. We benchmark Objective Suppression in two multi-constraint safety domains, including an autonomous driving domain where any incorrect behavior can lead to disastrous consequences. On the driving domain, we evaluate on open source and proprietary data and evaluate transfer to a real autonomous fleet. Empirically, we demonstrate that our proposed method, when combined with existing safe RL algorithms, can match the task reward achieved by baselines with significantly fewer constraint violations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Constrained policy optimization. ArXiv, abs/1705.10528, 2017. URL https://api.semanticscholar.org/CorpusID:10647707.
  2. E. Altman. Constrained markov decision processes. volume 7, 1999. URL https://api.semanticscholar.org/CorpusID:14906227.
  3. D. Bertsekas. Nonlinear Programming. Athena scientific optimization and computation series. Athena Scientific, 2016. ISBN 9781886529052. URL https://books.google.ca/books?id=TwOujgEACAAJ.
  4. Openai gym, 2016.
  5. Hierarchical model-based imitation learning for planning in autonomous driving, 2022. URL https://arxiv.org/abs/2210.09539.
  6. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(167):1–51, 2018. URL http://jmlr.org/papers/v18/15-636.html.
  7. Safe exploration in continuous action spaces. ArXiv, abs/1801.08757, 2018. URL https://api.semanticscholar.org/CorpusID:711218.
  8. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017.
  9. J. García and F. Fernández. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res., 16:1437–1480, 2015. URL https://api.semanticscholar.org/CorpusID:2497153.
  10. Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2023.
  11. Planning-oriented autonomous driving, 2023.
  12. Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios. arXiv preprint arXiv:2212.11419, 2022.
  13. Optlayer - practical constrained optimization for deep reinforcement learning in the real world. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 6236–6243, 2017. URL https://api.semanticscholar.org/CorpusID:37266866.
  14. Reasonnet: End-to-end driving with temporal and global reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13723–13733, June 2023.
  15. Reward constrained policy optimization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=SkfrvsA9FX.
  16. Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6:4915–4922, 2020. URL https://api.semanticscholar.org/CorpusID:226221775.
  17. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
  18. R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992. URL https://api.semanticscholar.org/CorpusID:2332513.
  19. Safebench: A benchmarking platform for safety evaluation of autonomous vehicles, 2022.
  20. Towards safe reinforcement learning with a safety editor policy. arXiv, 2022.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.