Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints (2405.00644v1)

Published 1 May 2024 in cs.AI

Abstract: To plan safely in uncertain environments, agents must balance utility with safety constraints. Safe planning problems can be modeled as a chance-constrained partially observable Markov decision process (CC-POMDP) and solutions often use expensive rollouts or heuristics to estimate the optimal value and action-selection policy. This work introduces the ConstrainedZero policy iteration algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy with an additional network head that estimates the failure probability given a belief. This failure probability guides safe action selection during online Monte Carlo tree search (MCTS). To avoid overemphasizing search based on the failure estimates, we introduce $\Delta$-MCTS, which uses adaptive conformal inference to update the failure threshold during planning. The approach is tested on a safety-critical POMDP benchmark, an aircraft collision avoidance system, and the sustainability problem of safe CO$_2$ storage. Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Vulcan: A Monte Carlo Algorithm for Large Chance Constrained MDPs with Risk Bounding Functions. arXiv:1809.01220, 2018.
  2. Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes. In AAAI Conference on Artificial Intelligence, volume 34, pages 9794–9801, 2020.
  3. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43, 2012.
  4. Allen C. Busch. Methodology for Establishing A Target Level of Safety. Federal Aviation Administraton Technical Center, 1985.
  5. Solving Stochastic Orienteering Problems with Chance Constraints Using Monte Carlo Tree Search. In International Conference on Automation Science and Engineering (CASE), pages 1170–1177. IEEE, 2022.
  6. A POMDP Model for Safe Geological Carbon Sequestration. NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2022.
  7. Continuous Upper Confidence Trees. In Learning and Intelligent Optimization. Springer, 2011.
  8. Rémi Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Computers and Games, 2007.
  9. Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains. In International Conference on Machine Learning, pages 3177–3187. PMLR, 2020.
  10. Guiding Belief Space Planning with Learned Models for Interactive Merging. In International Conference on Intelligent Transportation Systems (ITSC), pages 2542–2549, 2022.
  11. Mark J. Flannery. Capital Regulation and Insured Banks Choice of Individual Loan Default Risks. Journal of Monetary Economics, 24(2):235–258, 1989.
  12. Adaptive Conformal Inference Under Distribution Shift. Advances in Neural Information Processing Systems (NeurIPS), 34:1660–1672, 2021.
  13. Hybrid Risk-Aware Conditional Planning with Applications in Autonomous Vehicles. In IEEE Conference on Decision and Control (CDC), pages 3608–3614. IEEE, 2018.
  14. Piecewise Linear Dynamic Programming for Constrained POMDPs. In AAAI Conference on Artificial Intelligence, volume 1, pages 291–296, 2008.
  15. Online Planning for Constrained POMDPs with Continuous Spaces through Dual Ascent. International Conference on Automated Planning and Scheduling (ICAPS), 33(1):198–202, 2023.
  16. Next-Generation Airborne Collision Avoidance System. Lincoln Laboratory Journal, 19(1), 2012.
  17. Algorithms for Decision Making. MIT Press, 2022.
  18. Bandit Based Monte-Carlo Planning. In European Conference on Machine Learning. Springer, 2006.
  19. Partially Observable Markov Decision Processes in Robotics: A Survey. IEEE Transactions on Robotics, 39(1):21–40, 2022.
  20. Efficient BackProp. In Neural Networks: Tricks of the Trade, pages 9–50. Springer, 2002.
  21. Monte-Carlo Tree Search for Constrained POMDPs. In Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018.
  22. Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning, pages 157–163. Elsevier, 1994.
  23. A0C: Alpha Zero in Continuous Action Space. In ICML Planning and Learning (PAL) Workshop, 2018.
  24. BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations. arXiv:2306.00249, 2023.
  25. Planning under Uncertainty for Robotic Tasks with Mixed Observability. The International Journal of Robotics Research, 29(8), 2010.
  26. Robotic Manipulation of Multiple Objects as a POMDP. Artificial Intelligence, 247:213–228, 2017.
  27. POMDP Planning Under Object Composition Uncertainty: Application to Robotic Manipulation. IEEE Transactions on Robotics, 39(1):41–56, 2022.
  28. C-MCTS: Safe Planning with Monte Carlo Tree Search. arXiv:2305.16209, 2023.
  29. Belief Space Planning Assuming Maximum Likelihood Observations. Robotics: Science and Systems VI, 2010.
  30. RAO*: An Algorithm for Chance-Constrained POMDPs. In AAAI Conference on Artificial Intelligence, volume 30, 2016.
  31. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839), 2020.
  32. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 2016.
  33. Mastering the game of Go without human knowledge. Nature, 550(7676), 2017.
  34. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 2018.
  35. Monte Carlo Tree Search with Iteratively Refining State Abstractions. Advances in Neural Information Processing Systems (NeurIPS), 34:18698–18709, 2021.
  36. A Dynamic and Failure-Aware Task Scheduling Framework for Hadoop. IEEE Transactions on Cloud Computing, 8(2):553–569, 2020.
  37. Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces. In International Conference on Automated Planning and Scheduling (ICAPS), volume 28, 2018.
  38. Reinforcement Learning: An Introduction. MIT Press, 2018.
  39. Probabilistic Robotics. MIT Press, 2005.
  40. Eric A. Wan and Rudolph Van Der Merwe. The Unscented Kalman Filter for Nonlinear Estimation. In Adaptive Systems for Signal Processing, Communications, and Control Symposium (AS-SPCC), pages 153–158. IEEE, 2000.
  41. Adaptive Conformal Predictions for Time Series. In International Conference on Machine Learning (ICML), 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets