Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging the Gap between Partially Observable Stochastic Games and Sparse POMDP Methods (2405.18703v2)

Published 29 May 2024 in cs.GT

Abstract: Many real-world decision problems involve the interaction of multiple self-interested agents with limited sensing ability. The partially observable stochastic game (POSG) provides a mathematical framework for modeling these problems, however solving a POSG requires difficult reasoning over two critical factors: (1) information revealed by partial observations and (2) decisions other agents make. In the single agent case, partially observable Markov decision process (POMDP) planning can efficiently address partial observability with particle filtering. In the multi-agent case, extensive form game solution methods account for other agent's decisions, but preclude belief approximation. We propose a unifying framework that combines POMDP-inspired state distribution approximation and game-theoretic equilibrium search on information sets. This paper lays a theoretical foundation for the approach by bounding errors due to belief approximation, and empirically demonstrates effectiveness with a numerical example. The new approach enables planning in POSGs with very large state spaces, paving the way for reliable autonomous interaction in real-world physical environments and complementing multi-agent reinforcement learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response. European Journal of Operational Research, 291(1):296–308, 2021.
  2. Imperfect information games and counterfactual regret minimization in space domain awareness. The Advanced Maui Optical and Space Surveillance Technologies (AMOS) Conference, 2022.
  3. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  4. Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems, pages 2164–2172, 2010. URL http://papers.nips.cc/paper/4031-monte-carlo-planning-in-large-pomdps.pdf.
  5. Online algorithms for pomdps with continuous state, action, and observation spaces. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 28, pages 259–263, 2018.
  6. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems, volume 2008, 2008.
  7. Regret minimization in games with incomplete information. Advances in neural information processing systems, 20:1729–1736, 2007.
  8. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017. ISSN 0036-8075. doi: 10.1126/science.aam6960. URL https://science.sciencemag.org/content/356/6337/508.
  9. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, Nov 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1724-z. URL https://doi.org/10.1038/s41586-019-1724-z.
  10. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019. ISSN 0036-8075. doi: 10.1126/science.aau6249. URL https://science.sciencemag.org/content/364/6443/859.
  11. A general reinforcement learning algorithm that masters Chess, Shogi, and Go through self-play. Science, 362(6419):1140–1144, 2018. ISSN 0036-8075. doi: 10.1126/science.aar6404. URL https://science.sciencemag.org/content/362/6419/1140.
  12. BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations. In Reinforcement Learning Conference (RLC), 2024.
  13. Compositional learning-based planning for vision POMDPs. In Learning for Dynamics & Control (L4DC), 2023. URL https://arxiv.org/abs/2112.09456.
  14. Differentiable particle filters: End-to-end learning with algorithmic priors. In Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018. doi: 10.15607/RSS.2018.XIV.001.
  15. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2023. URL https://www.marl-book.com.
  16. Algorithms for Optimization. MIT Press, 2019.
  17. DESPOT: Online POMDP planning with regularization. Journal of Artificial Intelligence Research, 58:231–266, 2017.
  18. DESPOT-α𝛼\alphaitalic_α: Online POMDP planning with large state and observation spaces. In Robotics: Science and Systems, 2019.
  19. Milos Hauskrecht. Value-function approximations for partially observable markov decision processes. Journal of artificial intelligence research, 13:33–94, 2000.
  20. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49(2):193–208, Nov 2002. ISSN 1573-0565.
  21. Optimality guarantees for particle belief approximation of pomdps. Journal of Artificial Intelligence Research, 77:1591–1636, 2023.
  22. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000. doi: 10.1111/1468-0262.00153.
  23. Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018. ISSN 0036-8075. doi: 10.1126/science.aao1733. URL https://science.sciencemag.org/content/359/6374/418.
  24. Partially observable Markov decision processes in robotics: A survey. IEEE Transactions on Robotics, 39(1):21–40, 2022.
  25. Hanna Kurniawati. Partially observable Markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 5:253–277, 2022.
  26. A Concise Introduction to Decentralized POMDPs. Springer, 2016.
  27. Online planning for decentralized stochastic control with partial history sharing. In American Control Conference (ACC), pages 3544–3550. IEEE, 2019.
  28. Monte Carlo sampling methods for approximating interactive POMDPs. Journal of Artificial Intelligence Research, 34:297–337, 2009.
  29. Hsvi can solve zero-sum partially observable stochastic games. Dynamic Games and Applications, pages 1–55, 2023.
  30. Monte carlo sampling for regret minimization in extensive games. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009. URL https://proceedings.neurips.cc/paper/2009/file/00411460f7c92d2124a67ea0f4cb5f85-Paper.pdf.
  31. Harold W Kuhn. Extensive games. Proceedings of the National Academy of Sciences, 36(10):570–576, 1950.
  32. Accelerating best response calculation in large extensive games. In IJCAI, volume 11, pages 258–265, 2011.
  33. Simplifying complex observation models in continuous pomdp planning with probabilistic guarantees and practice. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20176–20184, 2024.
  34. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Tyler Becker (5 papers)
  2. Zachary Sunberg (11 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets