Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

No Compromise in Solution Quality: Speeding Up Belief-dependent Continuous POMDPs via Adaptive Multilevel Simplification (2310.10274v2)

Published 16 Oct 2023 in cs.AI and cs.RO

Abstract: Continuous POMDPs with general belief-dependent rewards are notoriously difficult to solve online. In this paper, we present a complete provable theory of adaptive multilevel simplification for the setting of a given externally constructed belief tree and MCTS that constructs the belief tree on the fly using an exploration technique. Our theory allows to accelerate POMDP planning with belief-dependent rewards without any sacrifice in the quality of the obtained solution. We rigorously prove each theoretical claim in the proposed unified theory. Using the general theoretical results, we present three algorithms to accelerate continuous POMDP online planning with belief-dependent rewards. Our two algorithms, SITH-BSP and LAZY-SITH-BSP, can be utilized on top of any method that constructs a belief tree externally. The third algorithm, SITH-PFT, is an anytime MCTS method that permits to plug-in any exploration technique. All our methods are guaranteed to return exactly the same optimal action as their unsimplified equivalents. We replace the costly computation of information-theoretic rewards with novel adaptive upper and lower bounds which we derive in this paper, and are of independent interest. We show that they are easy to calculate and can be tightened by the demand of our algorithms. Our approach is general; namely, any bounds that monotonically converge to the reward can be utilized to achieve significant speedup without any loss in performance. Our theory and algorithms support the challenging setting of continuous states, actions, and observations. The beliefs can be parametric or general and represented by weighted particles. We demonstrate in simulation a significant speedup in planning compared to baseline approaches with guaranteed identical performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. In: Advances in Neural Information Processing Systems (NIPS). pp. 64–72.
  2. Auer P, Cesa-Bianchi N and Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine learning 47(2): 235–256.
  3. Auger D, Couetoux A and Teytaud O (2013) Continuous upper confidence trees with polynomial exploration–consistency. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part I 13. Springer, pp. 194–209.
  4. In: 2010 13th International Conference on Information Fusion. pp. 1–8. 10.1109/ICIF.2010.5712013.
  5. Burgard W, Fox D and Thrun S (1997) Active mobile robot localization. In: Intl. Joint Conf. on AI (IJCAI). Citeseer, pp. 1346–1352.
  6. Crisan D and Doucet A (2002) A survey of convergence results on particle filtering for practitioners. IEEE Trans. Signal Processing .
  7. Dressel L and Kochenderfer MJ (2017) Efficient decision-theoretic target localization. In: Barbulescu L, Frank J, Mausam and Smith SF (eds.) Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling, ICAPS 2017, Pittsburgh, Pennsylvania, USA, June 18-23, 2017. AAAI Press, pp. 70–78.
  8. The Journal of Machine Learning Research 18(1): 831–835.
  9. Elimelech K and Indelman V (2022) Simplified decision making in the belief space using belief sparsification. The International Journal of Robotics Research 41(5): 470–496.
  10. Farhi E and Indelman V (2021) ix-bsp: Incremental belief space planning. https://arxiv.org/abs/2102.09539 .
  11. Farhi EI and Indelman V (2019) ix-bsp: Belief space planning through incremental expectation. In: IEEE Intl. Conf. on Robotics and Automation (ICRA).
  12. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N and Garnett R (eds.) Advances in Neural Information Processing Systems 31. Curran Associates, Inc., pp. 6933–6943.
  13. Fischer J and Tas OS (2020) Information particle filter tree: An online algorithm for pomdps with belief-based rewards on continuous domains. In: Intl. Conf. on Machine Learning (ICML). Vienna, Austria.
  14. Garg NP, Hsu D and Lee WS (2019) Despot-α𝛼\alphaitalic_α: Online pomdp planning with large state and observation spaces. In: Robotics: Science and Systems (RSS).
  15. Hoerger M and Kurniawati H (2021) An on-line pomdp solver for continuous observation spaces. In: IEEE Intl. Conf. on Robotics and Automation (ICRA). IEEE, pp. 7643–7649.
  16. In: Algorithmic Foundations of Robotics XII: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics. Springer, pp. 272–287.
  17. Hoerger M, Kurniawati H and Elfes A (2019) Multilevel monte-carlo for solving pomdps online. In: Proc. International Symposium on Robotics Research (ISRR).
  18. Hollinger GA and Sukhatme GS (2014) Sampling-based robotic information gathering algorithms. Intl. J. of Robotics Research : 1271–1287.
  19. Indelman V, Carlone L and Dellaert F (2015) Planning in the continuous domain: a generalized belief space approach for autonomous navigation in unknown environments. Intl. J. of Robotics Research 34(7): 849–882.
  20. Kearns M, Mansour Y and Ng AY (2002) A sparse sampling algorithm for near-optimal planning in large markov decision processes. Machine learning 49(2): 193–208.
  21. Kochenderfer M, Wheeler T and Wray K (2022) Algorithms for Decision Making. MIT Press.
  22. Kocsis L and Szepesvári C (2006) Bandit based monte-carlo planning. In: European conference on machine learning. Springer, pp. 282–293.
  23. Kurniawati H, Hsu D and Lee WS (2008) SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems (RSS).
  24. Papadimitriou C and Tsitsiklis J (1987) The complexity of Markov decision processes. Mathematics of operations research 12(3): 441–450.
  25. In: Robotics: Science and Systems (RSS). Zaragoza, Spain, pp. 587–593.
  26. Shienman M and Indelman V (2022) Nonmyopic distilled data association belief space planning under budget constraints. In: Proc. of the Intl. Symp. of Robotics Research (ISRR).
  27. Silver D and Veness J (2010) Monte-carlo planning in large pomdps. In: Advances in Neural Information Processing Systems (NIPS). pp. 2164–2172.
  28. Smith T and Simmons R (2004) Heuristic search value iteration for pomdps. In: Conf. on Uncertainty in Artificial Intelligence (UAI). pp. 520–527.
  29. Spaan MT, Veiga TS and Lima PU (2015) Decision-theoretic planning under uncertainty with information rewards for active cooperative perception. Autonomous Agents and Multi-Agent Systems 29(6): 1157–1185.
  30. Stachniss C, Grisetti G and Burgard W (2005) Information gain-based exploration using Rao-Blackwellized particle filters. In: Robotics: Science and Systems (RSS). pp. 65–72.
  31. In: Proceedings of the International Conference on Automated Planning and Scheduling, volume 28.
  32. Sutton RS and Barto AG (2018) Reinforcement learning: An introduction. MIT press.
  33. Sztyglic O and Indelman V (2022) Speeding up online pomdp planning via simplification. In: IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS).
  34. Thrun S, Burgard W and Fox D (2005) Probabilistic Robotics. The MIT press, Cambridge, MA.
  35. Van Den Berg J, Patil S and Alterovitz R (2012) Motion planning under uncertainty using iterative local optimization in belief space. Intl. J. of Robotics Research 31(11): 1263–1278.
  36. Walsh T, Goschin S and Littman M (2010) Integrating sample-based planning and model-based reinforcement learning. In: Nat. Conf. on Artificial Intelligence (AAAI), volume 24.
  37. JAIR 58: 231–266.
  38. Zhitnikov A and Indelman V (2022) Simplified risk aware decision making with belief dependent rewards in partially observable domains. Artificial Intelligence, Special Issue on “Risk-Aware Autonomous Systems: Theory and Practice” .
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Andrey Zhitnikov (8 papers)
  2. Ori Sztyglic (3 papers)
  3. Vadim Indelman (40 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com