Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning (1907.05861v2)

Published 11 Jul 2019 in cs.AI

Abstract: We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Further Optimal Regret Bounds for Thompson Sampling. In Artificial Intelligence and Statistics, pages 99–107, 2013.
  2. Finite-Time Analysis of the Multiarmed Bandit Problem. Machine learning, 47(2-3):235–256, 2002.
  3. Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search. In Advances in Neural Information Processing Systems, pages 1646–1654, 2013.
  4. Thompson Sampling based Monte-Carlo Planning in POMDPs. In Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, pages 29–37. AAAI Press, 2014.
  5. Stacked Thompson Bandits. In Proceedings of the 3rd International Workshop on Software Engineering for Smart Cyber-Physical Systems, pages 18–21. IEEE Press, 2017.
  6. Open Loop Optimistic Planning. In COLT, pages 477–489, 2010.
  7. An Empirical Evaluation of Thompson Sampling. In Advances in neural information processing systems, pages 2249–2257, 2011.
  8. Planning and Acting in Partially Observable Stochastic Domains. Artificial intelligence, 101(1):99–134, 1998.
  9. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis. In International Conference on Algorithmic Learning Theory, pages 199–213. Springer, 2012.
  10. Bandit based Monte-Carlo Planning. In ECML, volume 6, pages 282–293. Springer, 2006.
  11. Open Loop Execution of Tree-Search Algorithms. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 2362–2368. IJCAI Organization, 7 2018.
  12. Open Loop Search for General Video Game Playing. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pages 337–344. ACM, 2015.
  13. Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling. 33th AAAI Conference on Artificial Intelligence, 2019.
  14. Anytime Point-based Approximations for Large POMDPs. Journal of Artificial Intelligence Research, 27:335–380, 2006.
  15. Memory Bounded Monte Carlo Tree Search. AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2017.
  16. Monte-Carlo Planning in Large POMDPs. In Advances in neural information processing systems, pages 2164–2172, 2010.
  17. Mastering the Game of Go without Human Knowledge. Nature, 550(7676):354–359, 2017.
  18. Heuristic Search Value Iteration for POMDPs. In Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 520–527. AUAI Press, 2004.
  19. DESPOT: Online POMDP Planning with Regularization. In Advances in neural information processing systems, pages 1772–1780, 2013.
  20. William R Thompson. On the Likelihood that One Unknown Probability exceeds Another in View of the Evidence of Two Samples. Biometrika, 25(3/4):285–294, 1933.
  21. Open-loop Planning in Large-Scale Stochastic Domains. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pages 1436–1442. AAAI Press, 2013.
  22. Open-Loop Plans in Multi-Robot POMDPs. Technical report, Stanford CS Dept, 2005.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.