Safe POMDP Online Planning via Shielding (2309.10216v2)
Abstract: Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning.
- M. Lauri, D. Hsu, and J. Pajarinen, “Partially observable markov decision processes in robotics: A survey,” IEEE Transactions on Robotics, 2022.
- S. Koenig, R. Simmons, et al., “Xavier: A robot navigation architecture based on partially observable markov decision process models,” Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems, no. partially, pp. 91–122, 1998.
- H. Bai, S. Cai, N. Ye, D. Hsu, and W. S. Lee, “Intention-aware online POMDP planning for autonomous driving in a crowd,” in 2015 Ieee International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 454–460.
- S. Sheng, E. Pakdamanian, K. Han, Z. Wang, J. Lenneman, D. Parker, and L. Feng, “Planning for automated vehicles with human trust,” ACM Transactions on Cyber-Physical Systems, vol. 6, no. 4, pp. 1–21, 2022.
- A. Goldhoorn, A. Garrell, R. Alquézar, and A. Sanfeliu, “Searching and tracking people with cooperative mobile robots,” Autonomous Robots, vol. 42, no. 4, pp. 739–759, 2018.
- S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa, “Online planning algorithms for POMDPs,” Journal of Artificial Intelligence Research, vol. 32, pp. 663–704, 2008.
- D. Silver and J. Veness, “Monte-carlo planning in large POMDPs,” Advances in neural information processing systems, vol. 23, 2010.
- A. Somani, N. Ye, D. Hsu, and W. S. Lee, “Despot: Online POMDP planning with regularization,” Advances in neural information processing systems, vol. 26, 2013.
- J. Lee, G.-H. Kim, P. Poupart, and K.-E. Kim, “Monte-carlo tree search for constrained POMDPs,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- M. Khonji, A. Jasour, and B. C. Williams, “Approximability of constant-horizon constrained POMDP,” in IJCAI, 2019, pp. 5583–5590.
- Y. Wang, A. A. R. Newaz, J. D. Hernández, S. Chaudhuri, and L. E. Kavraki, “Online partial conditional plan synthesis for POMDPs with safe-reachability objectives: Methods and experiments,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 3, pp. 932–945, 2021.
- S. Junges, N. Jansen, and S. A. Seshia, “Enforcing almost-sure reachability in POMDPs,” in International Conference on Computer Aided Verification. Springer, 2021, pp. 602–625.
- S. Carr, N. Jansen, S. Junges, and U. Topcu, “Safe reinforcement learning via shielding under partial observability,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, 2023, pp. 14 748–14 756.
- G. Mazzi, A. Castellini, and A. Farinelli, “Risk-aware shielding of partially observable monte carlo planning policies,” Artificial Intelligence, vol. 324, p. 103987, 2023.
- A. Undurti and J. P. How, “An online algorithm for constrained POMDPs,” in 2010 IEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 3966–3973.
- K. Chatterjee, A. Elgyütt, P. Novotny, and O. Rouillé, “Expectation optimization with probabilistic guarantees in POMDPs with discounted-sum objectives,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 4692–4699.
- K. Chatterjee, P. Novotnỳ, G. Pérez, J.-F. Raskin, and Đ. Žikelić, “Optimizing expectation with guarantees in POMDPs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
- C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in games, vol. 4, no. 1, pp. 1–43, 2012.
- M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of probabilistic real-time systems,” in Proc. 23rd International Conference on Computer Aided Verification (CAV’11), ser. LNCS, vol. 6806. Springer, 2011, pp. 585–591.