Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rollout Heuristics for Online Stochastic Contingent Planning (2310.02345v1)

Published 3 Oct 2023 in cs.AI

Abstract: Partially observable Markov decision processes (POMDP) are a useful model for decision-making under partial observability and stochastic actions. Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform, using a Monte-Carlo tree search approach, based on the UCT (UCB applied to trees) algorithm for fully observable Markov-decision processes. POMCP develops an action-observation tree, and at the leaves, uses a rollout policy to provide a value estimate for the leaf. As such, POMCP is highly dependent on the rollout policy to compute good estimates, and hence identify good actions. Thus, many practitioners who use POMCP are required to create strong, domain-specific heuristics. In this paper, we model POMDPs as stochastic contingent planning problems. This allows us to leverage domain-independent heuristics that were developed in the planning community. We suggest two heuristics, the first is based on the well-known h_add heuristic from classical planning, and the second is computed in belief space, taking the value of information into account.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Alexandre Albore, Héctor Palacios & Hector Geffner (2009): A Translation-Based Approach to Contingent Planning. In: IJCAI, 9, pp. 1623–1628, 10.5555/1661445.1661706. Available at https://dl.acm.org/doi/10.5555/1661445.1661706.
  2. Blai Bonet & Héctor Geffner (2001): Planning as heuristic search. Artificial Intelligence 129(1-2), pp. 5–33, 10.1016/S0004-3702(01)00108-4.
  3. Blai Bonet & Hector Geffner (2009): Solving POMDPs: RTDP-Bel vs. Point-based Algorithms. In Craig Boutilier, editor: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009, pp. 1641–1646, 10.5555/1661445.1661709. Available at https://dl.acm.org/doi/10.5555/1661445.1661709.
  4. Blai Bonet & Hector Geffner (2011): Planning under Partial Observability by Classical Replanning: Theory and Experiments. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pp. 1936–1941, 10.5591/978-1-57735-516-8/IJCAI11-324.
  5. Ronen I Brafman & Guy Shani (2016): Online belief tracking using regression for contingent planning. Artificial Intelligence 241, pp. 131–152, 10.1016/j.artint.2016.08.005.
  6. Hector Geffner & Blai Bonet (2022): A concise introduction to models and methods for automated planning. Springer Nature.
  7. Malte Helmert & Carmel Domshlak (2009): Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? In Lubos Brim, Stefan Edelkamp, Eric A. Hansen & Peter Sanders, editors: Graph Search Engineering, 29.11. - 04.12.2009, Dagstuhl Seminar Proceedings 09491, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Germany, 10.1609/icaps.v19i1.13370. Available at http://drops.dagstuhl.de/opus/volltexte/2010/2432/.
  8. Journal of the ACM (JACM) 61(3), pp. 1–63, 10.1145/2559951.
  9. J. Hoffmann & B. Nebel (2001): The FF Planning System: Fast Plan Generation Through Heuristic Search. JAIR 14, pp. 253–302, 10.1613/jair.855.
  10. Jörg Hoffmann & Ronen I. Brafman (2005): Contingent Planning via Heuristic Forward Search witn Implicit Belief States. In Susanne Biundo, Karen L. Myers & Kanna Rajan, editors: Proceedings of the Fifteenth International Conference on Automated Planning and Scheduling (ICAPS 2005), June 5-10 2005, Monterey, California, USA, AAAI, pp. 71–80, 10.5555/3037062.3037072. Available at http://www.aaai.org/Library/ICAPS/2005/icaps05-008.php.
  11. Marcus Hörger, Hanna Kurniawati & Alberto Elfes (2019): Multilevel Monte-Carlo for Solving POMDPs Online. In Tamim Asfour, Eiichi Yoshida, Jaeheung Park, Henrik Christensen & Oussama Khatib, editors: Robotics Research - The 19th International Symposium ISRR 2019, Hanoi, Vietnam, October 6-10, 2019, Springer Proceedings in Advanced Robotics 20, Springer, pp. 174–190, 10.1007/978-3-030-95459-8_11.
  12. Ronald A Howard (1966): Information value theory. IEEE Transactions on systems science and cybernetics 2(1), pp. 22–26, 10.1109/TSSC.1966.300074.
  13. Thomas Keller & Patrick Eyerich (2012): PROST: Probabilistic Planning Based on UCT. In Lee McCluskey, Brian Charles Williams, José Reinaldo Silva & Blai Bonet, editors: Proceedings of the Twenty-Second International Conference on Automated Planning and Scheduling, ICAPS 2012, Atibaia, São Paulo, Brazil, June 25-19, 2012, AAAI, 10.1609/icaps.v22i1.13518. Available at http://www.aaai.org/ocs/index.php/ICAPS/ICAPS12/paper/view/4715.
  14. Thomas Keller & Malte Helmert (2013): Trial-based heuristic tree search for finite horizon MDPs. In: Proceedings of the International Conference on Automated Planning and Scheduling, 23, pp. 135–143, 10.1609/icaps.v23i1.13557. Available at http://www.aaai.org/ocs/index.php/ICAPS/ICAPS13/paper/view/6026.
  15. Sung-Kyun Kim, Oren Salzman & Maxim Likhachev (2019): POMHDP: Search-based belief space planning using multiple heuristics. In: Proceedings of the International Conference on Automated Planning and Scheduling, 29, pp. 734–744, 10.1609/icaps.v29i1.3542. Available at https://ojs.aaai.org/index.php/ICAPS/article/view/3542.
  16. Hanna Kurniawati (2022): Partially observable markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems 5, pp. 253–277, 10.1146/annurev-control-042920-092451.
  17. Hanna Kurniawati & Vinay Yadav (2016): An online POMDP solver for uncertainty planning in dynamic environment. In: Robotics Research: The 16th International Symposium ISRR, Springer, pp. 611–629, 10.1007/978-3-319-28872-7_35.
  18. The International Journal of Robotics Research 38(2-3), pp. 162–181, 10.1177/0278364918780322.
  19. Silvia Richter, Malte Helmert & Matthias Westphal (2008): Landmarks Revisited. In: AAAI, 8, pp. 975–982, 10.2307/j.ctt1zxsjcs. Available at http://www.aaai.org/Library/AAAI/2008/aaai08-155.php.
  20. Jussi Rintanen (2008): Regression for classical and nondeterministic planning. In: ECAI 2008, IOS Press, pp. 568–572, 10.3233/978-1-58603-891-5-568.
  21. Journal of Artificial Intelligence Research 32, pp. 663–704, 10.1613/jair.2567.
  22. Juan Carlos Saborío & Joachim Hertzberg (2019): Planning Under Uncertainty Through Goal-Driven Action Selection. In: Agents and Artificial Intelligence: 10th International Conference, ICAART 2018, Funchal, Madeira, Portugal, January 16–18, 2018, Revised Selected Papers 10, Springer, pp. 182–201, 10.1007/978-3-030-05453-3_9.
  23. Juan Carlos Saborío & Joachim Hertzberg (2020): Efficient planning under uncertainty with incremental refinement. In: Uncertainty in Artificial Intelligence, PMLR, pp. 303–312, 10.1109/TSMC.1987.4309045.
  24. Guy Shani & Ronen I Brafman (2011): Replanning in domains with partial information and sensing actions. In: IJCAI, 2011, Citeseer, pp. 2021–2026, 10.5591/978-1-57735-516-8/IJCAI11-337.
  25. Guy Shani, Joelle Pineau & Robert Kaplow (2013): A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27, pp. 1–51, 10.1007/s10458-012-9200-2.
  26. In: Proceedings of the International Symposium on Combinatorial Search, 10, pp. 97–105, 10.1609/socs.v10i1.18507.
  27. David Silver & Joel Veness (2010): Monte-Carlo planning in large POMDPs. Advances in neural information processing systems 23, 10.5555/2997046.2997137.
  28. Richard D Smallwood & Edward J Sondik (1973): The optimal control of partially observable Markov processes over a finite horizon. Operations research 21(5), pp. 1071–1088, 10.1287/opre.21.5.1071.
  29. Advances in neural information processing systems 26, 10.1613/jair.5328.
  30. In: Proceedings of the International Conference on Automated Planning and Scheduling, 28, pp. 259–263, 10.48550/arXiv.1709.06196.
  31. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp. 8241–8247, 10.1109/ICRA.2019.8793494.
  32. Journal of Artificial Intelligence Research 58, pp. 231–266, 10.1613/jair.5328.
Citations (1)

Summary

We haven't generated a summary for this paper yet.