Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

State-wise Safe Reinforcement Learning: A Survey (2302.03122v3)

Published 6 Feb 2023 in cs.LG, cs.AI, and cs.RO

Abstract: Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Constrained policy optimization. In International Conference on Machine Learning, pages 22–31. PMLR, 2017.
  2. Control barrier function based quadratic programs with application to adaptive cruise control. In 53rd IEEE Conference on Decision and Control, pages 6271–6278. IEEE, 2014.
  3. Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
  4. Conservative safety critics for exploration. arXiv preprint arXiv:2010.14497, 2020.
  5. Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623, 2019.
  6. Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
  7. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
  8. Safe and sample-efficient reinforcement learning for clustered dynamic environments. IEEE Control Systems Letters, 6:1928–1933, 2021.
  9. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3387–3395, 2019.
  10. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018.
  11. Lyapunov-based safe policy optimization for continuous control. ICML 2019 Workshop RL4RealLife, abs/1901.10031, 2019.
  12. On kernelized multi-armed bandits. In International Conference on Machine Learning, pages 844–853. PMLR, 2017.
  13. Safe exploration in continuous action spaces. CoRR, abs/1801.08757, 2018.
  14. Shieldnn: A provably safe nn filter for unsafe nn controllers. CoRR, abs/2006.09564, 2020.
  15. A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7):2737–2752, 2018.
  16. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  17. Reinforcement learning with automated auxiliary loss search. arXiv preprint arXiv:2210.06041, 2022.
  18. A hierarchical long short term safety framework for efficient robot manipulation under uncertainty. Robotics and Computer-Integrated Manufacturing, 82:102522, 2023.
  19. Autocost: Evolving intrinsic cost for zero-violation reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  20. Nonlinear Systems. Pearson Education. Prentice Hall, 2002.
  21. Kinematic and dynamic vehicle models for autonomous driving control design. In 2015 IEEE Intelligent Vehicles Symposium (IV), pages 1094–1099. IEEE, 2015.
  22. Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv preprint arXiv:1802.06480, 2018.
  23. Control in a safe set: Addressing safety in human-robot interactions. In ASME 2014 Dynamic Systems and Control Conference. American Society of Mechanical Engineers Digital Collection, 2014.
  24. Policy learning with constraints in model-free reinforcement learning: A survey. In The 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021.
  25. Learn zero-constraint-violation policy in model-free constrained reinforcement learning. arXiv preprint arXiv:2111.12953, 2021.
  26. Joint synthesis of safety certificate and safe control policy using constrained reinforcement learning. In Learning for Dynamics and Control Conference, pages 97–109. PMLR, 2022.
  27. Safe adaptation with multiplicative uncertainties using robust safe set algorithm. IFAC-PapersOnLine, 54(20):360–365, 2021.
  28. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
  29. Reachability-based trajectory safeguard (rts): A safe and fast reinforcement learning safety layer for continuous control. IEEE Robotics and Automation Letters, 6(2):3663–3670, 2021.
  30. Pac model-free reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pages 881–888. ACM Press, 2006.
  31. arXiv preprint arXiv:1805.11074, 2018.
  32. Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021.
  33. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  34. Safe reinforcement learning in constrained markov decision processes. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
  35. Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  36. Safe control with neural network dynamic models. In Learning for Dynamics and Control Conference, pages 739–750. PMLR, 2022.
  37. Persistently feasible robust safe control by safety index synthesis and convex semi-infinite programming. IEEE Control Systems Letters, 2022.
  38. Gaussian processes for machine learning. MIT press Cambridge, MA, 2006.
  39. Evaluating model-free reinforcement learning toward safety-critical tasks. arXiv preprint arXiv:2212.05727, 2022.
  40. Approximation gradient error variance reduced optimization. In Workshop on Reinforcement Learning in Games (RLG) at The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), 2019.
  41. Stochastic variance reduction for deep q-learning. arXiv preprint arXiv:1905.08152, 2019.
  42. Contact-rich trajectory generation in confined environments using iterative convex optimization. In Dynamic Systems and Control Conference, volume 84287, page V002T31A002. American Society of Mechanical Engineers, 2020.
  43. Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
  44. Provably safe tolerance estimation for robot arms via sum-of-squares programming. IEEE Control Systems Letters, 6:3439–3444, 2022.
  45. Probabilistic safeguard for reinforcement learning using safety index guided gaussian process models. arXiv preprint arXiv:2210.01041, 2022.
  46. Safety index synthesis via sum-of-squares programming. arXiv preprint arXiv:2209.09134, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Weiye Zhao (24 papers)
  2. Tairan He (22 papers)
  3. Rui Chen (310 papers)
  4. Tianhao Wei (25 papers)
  5. Changliu Liu (134 papers)
Citations (50)

Summary

We haven't generated a summary for this paper yet.