Papers
Topics
Authors
Recent
2000 character limit reached

Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach (2402.02954v3)

Published 5 Feb 2024 in cs.GT and cs.LG

Abstract: A recent theory shows that a multi-player decentralized partially observable Markov decision process can be transformed into an equivalent single-player game, enabling the application of \citeauthor{bellman}'s principle of optimality to solve the single-player game by breaking it down into single-stage subgames. However, this approach entangles the decision variables of all players at each single-stage subgame, resulting in backups with a double-exponential complexity. This paper demonstrates how to disentangle these decision variables while maintaining optimality under hierarchical information sharing, a prominent management style in our society. To achieve this, we apply the principle of optimality to solve any single-stage subgame by breaking it down further into smaller subgames, enabling us to make single-player decisions at a time. Our approach reveals that extensive-form games always exist with solutions to a single-stage subgame, significantly reducing time complexity. Our experimental results show that the algorithms leveraging these findings can scale up to much larger multi-player games without compromising optimality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Optimizing memory-bounded controllers for decentralized pomdps. arXiv preprint arXiv:1206.5258, 2012.
  2. Decentralized control of partially observable markov decision processes. In CDC, 2013.
  3. Solving Transition Independent Decentralized Markov Decision Processes. JAIR, 22:423–455, 2004.
  4. Bellman, R. E. Dynamic Programming. Dover Publications, Incorporated, 1957.
  5. The Complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research, 27, 2002.
  6. Cooperative Multi-agent Policy Gradient. In ECML-PKDD, pp.  459–476, 2018.
  7. Point-based incremental pruning heuristic for solving finite-horizon Dec-POMDPs. In AAMAS, pp.  569–576, 2009.
  8. Scaling up decentralized mdps through heuristic search. In de Freitas, N. and Murphy, K. P. (eds.), UAI, pp. 217–226, 2012.
  9. Optimally solving Dec-POMDPs as continuous-state MDPs. In IJCAI, pp.  90–96, 2013.
  10. Exploiting Separability in Multi-Agent Planning with Continuous-State MDPs. In AAMAS, 2014.
  11. Optimally solving Dec-POMDPs as continuous-state MDPs. JAIR, 2016.
  12. Counterfactual multi-agent policy gradients. In AAAI, 2018.
  13. Cooperative inverse reinforcement learning. In NIPS, 2016.
  14. Dynamic Programming for Partially Observable Stochastic Games. In AAAI, 2004.
  15. Solving partially observable stochastic games with public observations. In AAAI, 2019.
  16. Heuristic search value iteration for one-sided partially observable stochastic games. In AAAI, 2017.
  17. Planning and acting in partially observable stochastic domains. Artificial intelligence, pp.  99–134, 1998.
  18. Actor-critic algorithms. In Neural Information Processing Systems, 1999.
  19. Rethinking formal models of partially observable multiagent decision making. Artificial Intelligence, 303:103645, 2022.
  20. Multi-agent actor-critic for mixed cooperative-competitive environments. In NIPS, volume 30, pp.  6379–6390, 2017.
  21. Point based value iteration with optimal belief compression for dec-pomdps. In NIPS, 2013.
  22. An efficient, generalized Bellman update for cooperative inverse reinforcement learning. In ICML, 2018.
  23. Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In International Joint conference on Artificial Intelligence (IJCAI), 2003.
  24. Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs. In AAAI, 2005.
  25. Optimal control strategies in delayed sharing information structures. IEEE Transactions on Automatic Control, 2010.
  26. Decentralized stochastic control with partial history sharing: A common information approach. IEEE Transactions on Automatic Control, 58(7):1644–1658, 2013.
  27. Oliehoek, F. A. Sufficient plan-time statistics for decentralized pomdps. In Twenty-Third International Joint Conference on Artificial Intelligence, 2013.
  28. Heuristic search for identical payoff bayesian games. In AAMAS, pp.  1115–1122, 2010.
  29. Decentralized control of a multiple access broadcast channel: performance bounds. In CDC, volume 1, pp.  293–298 vol.1, 1996. doi: 10.1109/CDC.1996.574318.
  30. Learning to cooperate via policy search. arXiv preprint cs/0105032, 2001.
  31. Point-based value iteration: An anytime algorithm for pomdps. In IJCAI, volume 3, pp.  1025–1032, 2003.
  32. The complexity of multiagent systems: The price of silence. In AAMAS, 2003.
  33. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML, 2018.
  34. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.
  35. Reinforcement learning: An introduction. MIT press, 2018.
  36. An optimal best-first search algorithm for solving infinite horizon dec-pomdps. In ECML, 2005.
  37. Tan, M. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. In Huhns, M. N. and Singh, M. P. (eds.), Readings in Agents, pp.  487–494. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998.
  38. Tsitsiklis, J. N. Problems in decentralized decision making and computation. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1984.
  39. Optimally solving two-agent decentralized POMDPs under one-sided information sharing. In ICML, pp.  10473–10482, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper: