Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online POMDP Planning with Anytime Deterministic Guarantees (2310.01791v2)

Published 3 Oct 2023 in cs.AI and cs.RO

Abstract: Autonomous agents operating in real-world scenarios frequently encounter uncertainty and make decisions based on incomplete information. Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs). However, finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. In recent years, approximate algorithms, such as tree search and sample-based methodologies, have emerged as state-of-the-art POMDP solvers for larger problems. Despite their effectiveness, these algorithms offer only probabilistic and often asymptotic guarantees toward the optimal solution due to their dependence on sampling. To address these limitations, we derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one. First, we derive bounds for selecting a subset of the observations to branch from while computing a complete belief at each posterior node. Then, since a complete belief update may be computationally demanding, we extend the bounds to support reduction of both the state and the observation spaces. We demonstrate how our guarantees can be integrated with existing state-of-the-art solvers that sample a subset of states and observations. As a result, the returned solution holds deterministic bounds relative to the optimal policy. Lastly, we substantiate our findings with supporting experimental results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235–256, 2002.
  2. POMDPs.jl: A framework for sequential decision making under uncertainty. Journal of Machine Learning Research, 18(26):1–5, 2017. URL http://jmlr.org/papers/v18/16-300.html.
  3. An on-line pomdp solver for continuous observation spaces. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 7643–7649. IEEE, 2021.
  4. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1):99–134, 1998.
  5. Sparse tree search optimality guarantees in pomdps with continuous observation spaces. In Intl. Joint Conf. on AI (IJCAI), pages 4135–4142, 7 2020.
  6. Generalized optimality guarantees for solving continuous observation pomdps through particle belief mdp approximation. arXiv preprint arXiv:2210.05015, 2022.
  7. Monte-carlo planning in large pomdps. In Advances in Neural Information Processing Systems (NIPS), pages 2164–2172, 2010.
  8. Despot: Online pomdp planning with regularization. In NIPS, volume 13, pages 1772–1780, 2013.
  9. Online algorithms for pomdps with continuous state, action, and observation spaces. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 28, 2018.
  10. Adaptive online packing-guided search for pomdps. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems (NIPS), volume 34, pages 28419–28430. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ef41d488755367316f04fc0e0e9dc9fc-Paper.pdf.
  11. Despot: Online pomdp planning with regularization. JAIR, 58:231–266, 2017.
Citations (5)

Summary

We haven't generated a summary for this paper yet.