ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints (2405.00644v1)
Abstract: To plan safely in uncertain environments, agents must balance utility with safety constraints. Safe planning problems can be modeled as a chance-constrained partially observable Markov decision process (CC-POMDP) and solutions often use expensive rollouts or heuristics to estimate the optimal value and action-selection policy. This work introduces the ConstrainedZero policy iteration algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy with an additional network head that estimates the failure probability given a belief. This failure probability guides safe action selection during online Monte Carlo tree search (MCTS). To avoid overemphasizing search based on the failure estimates, we introduce $\Delta$-MCTS, which uses adaptive conformal inference to update the failure threshold during planning. The approach is tested on a safety-critical POMDP benchmark, an aircraft collision avoidance system, and the sustainability problem of safe CO$_2$ storage. Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.
- Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions. arXiv:1809.01220, 2018.
- Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes. In AAAI Conference on Artificial Intelligence, volume 34, pages 9794–9801, 2020.
- A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43, 2012.
- Allen C. Busch. Methodology for Establishing A Target Level of Safety. Federal Aviation Administraton Technical Center, 1985.
- Solving Stochastic Orienteering Problems with Chance Constraints Using Monte Carlo Tree Search. In International Conference on Automation Science and Engineering (CASE), pages 1170–1177. IEEE, 2022.
- A POMDP Model for Safe Geological Carbon Sequestration. NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2022.
- Continuous Upper Confidence Trees. In Learning and Intelligent Optimization. Springer, 2011.
- Rémi Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Computers and Games, 2007.
- Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains. In International Conference on Machine Learning, pages 3177–3187. PMLR, 2020.
- Guiding Belief Space Planning with Learned Models for Interactive Merging. In International Conference on Intelligent Transportation Systems (ITSC), pages 2542–2549, 2022.
- Mark J. Flannery. Capital Regulation and Insured Banks Choice of Individual Loan Default Risks. Journal of Monetary Economics, 24(2):235–258, 1989.
- Adaptive Conformal Inference Under Distribution Shift. Advances in Neural Information Processing Systems (NeurIPS), 34:1660–1672, 2021.
- Hybrid Risk-Aware Conditional Planning with Applications in Autonomous Vehicles. In IEEE Conference on Decision and Control (CDC), pages 3608–3614. IEEE, 2018.
- Piecewise Linear Dynamic Programming for Constrained POMDPs. In AAAI Conference on Artificial Intelligence, volume 1, pages 291–296, 2008.
- Online Planning for Constrained POMDPs with Continuous Spaces through Dual Ascent. International Conference on Automated Planning and Scheduling (ICAPS), 33(1):198–202, 2023.
- Next-Generation Airborne Collision Avoidance System. Lincoln Laboratory Journal, 19(1), 2012.
- Algorithms for Decision Making. MIT Press, 2022.
- Bandit Based Monte-Carlo Planning. In European Conference on Machine Learning. Springer, 2006.
- Partially Observable Markov Decision Processes in Robotics: A Survey. IEEE Transactions on Robotics, 39(1):21–40, 2022.
- Efficient BackProp. In Neural Networks: Tricks of the Trade, pages 9–50. Springer, 2002.
- Monte-Carlo Tree Search for Constrained POMDPs. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018.
- Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning, pages 157–163. Elsevier, 1994.
- A0C: Alpha Zero in Continuous Action Space. In ICML Planning and Learning (PAL) Workshop, 2018.
- BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations. arXiv:2306.00249, 2023.
- Planning under Uncertainty for Robotic Tasks with Mixed Observability. The International Journal of Robotics Research, 29(8), 2010.
- Robotic Manipulation of Multiple Objects as a POMDP. Artificial Intelligence, 247:213–228, 2017.
- POMDP Planning Under Object Composition Uncertainty: Application to Robotic Manipulation. IEEE Transactions on Robotics, 39(1):41–56, 2022.
- C-MCTS: Safe Planning with Monte Carlo Tree Search. arXiv:2305.16209, 2023.
- Belief Space Planning Assuming Maximum Likelihood Observations. Robotics: Science and Systems VI, 2010.
- RAO*: An Algorithm for Chance-Constrained POMDPs. In AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839), 2020.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 2016.
- Mastering the game of Go without human knowledge. Nature, 550(7676), 2017.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 2018.
- Monte Carlo Tree Search with Iteratively Refining State Abstractions. Advances in Neural Information Processing Systems (NeurIPS), 34:18698–18709, 2021.
- A Dynamic and Failure-Aware Task Scheduling Framework for Hadoop. IEEE Transactions on Cloud Computing, 8(2):553–569, 2020.
- Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. In International Conference on Automated Planning and Scheduling (ICAPS), volume 28, 2018.
- Reinforcement Learning: An Introduction. MIT Press, 2018.
- Probabilistic Robotics. MIT Press, 2005.
- Eric A. Wan and Rudolph Van Der Merwe. The Unscented Kalman Filter for Nonlinear Estimation. In Adaptive Systems for Signal Processing, Communications, and Control Symposium (AS-SPCC), pages 153–158. IEEE, 2000.
- Adaptive Conformal Predictions for Time Series. In International Conference on Machine Learning (ICML), 2022.