Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees

Published 14 May 2023 in cs.AI | (2305.08049v2)

Abstract: The Partially Observable Markov Decision Process (POMDP) provides a principled framework for decision making in stochastic partially observable environments. However, computing good solutions for problems with continuous action spaces remains challenging. To ease this challenge, we propose a simple online POMDP solver, called Lazy Cross-Entropy Search Over Policy Trees (LCEOPT). At each planning step, our method uses a novel lazy Cross-Entropy method to search the space of policy trees, which provide a simple policy representation. Specifically, we maintain a distribution on promising finite-horizon policy trees. The distribution is iteratively updated by sampling policies, evaluating them via Monte Carlo simulation, and refitting them to the top-performing ones. Our method is lazy in the sense that it exploits the policy tree representation to avoid redundant computations in policy sampling, evaluation, and distribution update. This leads to computational savings of up to two orders of magnitude. Our LCEOPT is surprisingly simple as compared to existing state-of-the-art methods, yet empirically outperforms them on several continuous-action POMDP problems, particularly for problems with higher-dimensional action spaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. FIRM: Feedback controller-based information-state roadmap - A framework for motion planning under uncertainty. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 4284–4291.
  2. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2): 174–188.
  3. Integrated perception and planning in the continuous space: A POMDP approach. The International Journal of Robotics Research, 33(9): 1288–1302.
  4. The Cross-Entropy Method for Optimization. In Rao, C.; and Govindaraju, V., eds., Handbook of Statistics - Machine Learning: Theory and Applications, 35–59. Elsevier.
  5. Coulom, R. 2007. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In van den Herik, H. J.; Ciancarini, P.; and Donkers, H. H. L. M. J., eds., Computers and Games, 72–83. Springer. ISBN 978-3-540-75538-8.
  6. A Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134(1): 19–67.
  7. POMDPs for sustainable fishery management. In 23rd International Congress on Modelling and Simulation-Supporting Evidence-Based Decision Making: The Role of Modelling and Simulation, MODSIM 2019, 645–651. Modelling and Simulation Society of Australia and New Zealand.
  8. Learning Latent Dynamics for Planning from Pixels. In Proc. of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 2555–2565. PMLR.
  9. Linearization in Motion Planning under Uncertainty. In Goldberg, K.; Abbeel, P.; Bekris, K.; and Miller, L., eds., Algorithmic Foundations of Robotics XII: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics, 272–287. Springer. ISBN 978-3-030-43089-4.
  10. A Software Framework for Planning Under Partial Observability. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1–9.
  11. Adaptive Discretization Using Voronoi Trees for Continuous-Action POMDPs. In LaValle, S. M.; O’Kane, J. M.; Otte, M.; Sadigh, D.; and Tokekar, P., eds., Algorithmic Foundations of Robotics XV: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics, 170–187. Springer. ISBN 978-3-031-21090-7.
  12. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2): 99–134.
  13. Kurniawati, H. 2022. Partially observable markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 5: 253–277.
  14. Motion planning under uncertainty for robotic tasks with long time horizons. The International Journal of Robotics Research, 30(3): 308–323.
  15. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems III, 65–72. MIT Press.
  16. An online POMDP solver for uncertainty planning in dynamic environment. In Robotics Research: The 16th International Symposium ISRR, 611–629. Springer.
  17. Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs. In 2021 60th IEEE Conference on Decision and Control (CDC), 4493–4500.
  18. Lindquist, A. 1973. On Feedback Control of Linear Stochastic Systems. SIAM Journal on Control, 11(2): 323–343.
  19. The cross entropy method for fast policy search. In Proceedings of the 20th International Conference on Machine Learning, 512–519. AAAI Press.
  20. Bayesian optimized Monte Carlo planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 11880–11887. AAAI.
  21. A Cross-Entropy Approach to Solving Dec-POMDPs. In Badica, C.; and Paprzycki, M., eds., Advances in Intelligent and Distributed Computing, 145–154. Springer. ISBN 978-3-540-74930-1.
  22. Graph-based Cross Entropy method for solving multi-robot decentralized POMDPs. In 2016 IEEE International Conference on Robotics and Automation (ICRA), 5395–5402.
  23. The complexity of Markov decision processes. Mathematics of Operations Research, 12(3): 441–450.
  24. Point-based value iteration: An anytime algorithm for POMDPs. In Proceedings of 18th International Joint Conference on Artificial Intelligence (IJCAI ’03), 1025–1032.
  25. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte Carlo Simulation and Machine Learning. Springer. ISBN 038721240X.
  26. POMDP+ information-decay: Incorporating defender’s behaviour in autonomous penetration testing. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 30, 235–243.
  27. An online and approximate solver for POMDPs with continuous action space. In 2015 IEEE International Conference on Robotics and Automation, 2290–2297. IEEE.
  28. Monte-Carlo Planning in Large POMDPs. In Lafferty, J.; Williams, C.; Shawe-Taylor, J.; Zemel, R.; and Culotta, A., eds., Advances in Neural Information Processing Systems, volume 23. Curran Associates.
  29. Point-Based POMDP Algorithms: Improved Analysis and Implementation. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, 542–549. AUAI Press. ISBN 0974903914.
  30. High-Frequency Replanning Under Uncertainty Using Parallel Sampling-Based Motion Planning. IEEE Transactions on Robotics, 31(1): 104–116.
  31. Online algorithms for POMDPs with continuous state, action, and observation spaces. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 28, 259–263. AAAI Press.
  32. LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information. The International Journal of Robotics Research, 30(7): 895–913.
  33. Motion planning under uncertainty using iterative local optimization in belief space. The International Journal of Robotics Research, 31(11): 1263–1278.
  34. An On-Line Planner for POMDPs with Large Discrete Action Space: A Quantile-Based Approach. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 28, 273–277. AAAI Press.
  35. DESPOT: Online POMDP Planning with Regularization. Journal of Artificial Intelligence Research, 58: 231–266.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.