Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach (2402.02954v3)
Abstract: A recent theory shows that a multi-player decentralized partially observable Markov decision process can be transformed into an equivalent single-player game, enabling the application of \citeauthor{bellman}'s principle of optimality to solve the single-player game by breaking it down into single-stage subgames. However, this approach entangles the decision variables of all players at each single-stage subgame, resulting in backups with a double-exponential complexity. This paper demonstrates how to disentangle these decision variables while maintaining optimality under hierarchical information sharing, a prominent management style in our society. To achieve this, we apply the principle of optimality to solve any single-stage subgame by breaking it down further into smaller subgames, enabling us to make single-player decisions at a time. Our approach reveals that extensive-form games always exist with solutions to a single-stage subgame, significantly reducing time complexity. Our experimental results show that the algorithms leveraging these findings can scale up to much larger multi-player games without compromising optimality.
- Optimizing memory-bounded controllers for decentralized pomdps. arXiv preprint arXiv:1206.5258, 2012.
- Decentralized control of partially observable markov decision processes. In CDC, 2013.
- Solving Transition Independent Decentralized Markov Decision Processes. JAIR, 22:423–455, 2004.
- Bellman, R. E. Dynamic Programming. Dover Publications, Incorporated, 1957.
- The Complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research, 27, 2002.
- Cooperative Multi-agent Policy Gradient. In ECML-PKDD, pp. 459–476, 2018.
- Point-based incremental pruning heuristic for solving finite-horizon Dec-POMDPs. In AAMAS, pp. 569–576, 2009.
- Scaling up decentralized mdps through heuristic search. In de Freitas, N. and Murphy, K. P. (eds.), UAI, pp. 217–226, 2012.
- Optimally solving Dec-POMDPs as continuous-state MDPs. In IJCAI, pp. 90–96, 2013.
- Exploiting Separability in Multi-Agent Planning with Continuous-State MDPs. In AAMAS, 2014.
- Optimally solving Dec-POMDPs as continuous-state MDPs. JAIR, 2016.
- Counterfactual multi-agent policy gradients. In AAAI, 2018.
- Cooperative inverse reinforcement learning. In NIPS, 2016.
- Dynamic Programming for Partially Observable Stochastic Games. In AAAI, 2004.
- Solving partially observable stochastic games with public observations. In AAAI, 2019.
- Heuristic search value iteration for one-sided partially observable stochastic games. In AAAI, 2017.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, pp. 99–134, 1998.
- Actor-critic algorithms. In Neural Information Processing Systems, 1999.
- Rethinking formal models of partially observable multiagent decision making. Artificial Intelligence, 303:103645, 2022.
- Multi-agent actor-critic for mixed cooperative-competitive environments. In NIPS, volume 30, pp. 6379–6390, 2017.
- Point based value iteration with optimal belief compression for dec-pomdps. In NIPS, 2013.
- An efficient, generalized Bellman update for cooperative inverse reinforcement learning. In ICML, 2018.
- Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In International Joint conference on Artificial Intelligence (IJCAI), 2003.
- Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs. In AAAI, 2005.
- Optimal control strategies in delayed sharing information structures. IEEE Transactions on Automatic Control, 2010.
- Decentralized stochastic control with partial history sharing: A common information approach. IEEE Transactions on Automatic Control, 58(7):1644–1658, 2013.
- Oliehoek, F. A. Sufficient plan-time statistics for decentralized pomdps. In Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
- Heuristic search for identical payoff bayesian games. In AAMAS, pp. 1115–1122, 2010.
- Decentralized control of a multiple access broadcast channel: performance bounds. In CDC, volume 1, pp. 293–298 vol.1, 1996. doi: 10.1109/CDC.1996.574318.
- Learning to cooperate via policy search. arXiv preprint cs/0105032, 2001.
- Point-based value iteration: An anytime algorithm for pomdps. In IJCAI, volume 3, pp. 1025–1032, 2003.
- The complexity of multiagent systems: The price of silence. In AAMAS, 2003.
- QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML, 2018.
- Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.
- Reinforcement learning: An introduction. MIT press, 2018.
- An optimal best-first search algorithm for solving infinite horizon dec-pomdps. In ECML, 2005.
- Tan, M. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. In Huhns, M. N. and Singh, M. P. (eds.), Readings in Agents, pp. 487–494. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998.
- Tsitsiklis, J. N. Problems in decentralized decision making and computation. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1984.
- Optimally solving two-agent decentralized POMDPs under one-sided information sharing. In ICML, pp. 10473–10482, 2020.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.