Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Configurable Mirror Descent: Towards a Unification of Decision Making (2405.11746v1)

Published 20 May 2024 in cs.AI, cs.GT, cs.LG, and cs.MA

Abstract: Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific decision-making problems. Despite the successes in specific categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: \emph{Can we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problems?} There are several main challenges to address this question: i) different decision-making categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures. iii) We construct the \textsc{GameBench} with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (98)
  1. Avalon: A benchmark for RL generalization using procedurally generated worlds. In NeurIPS Datasets and Benchmarks Track, pp.  12813–12825, 2022.
  2. Input convex neural networks. In ICML, pp.  146–155, 2017.
  3. Optimistic mirror descent either converges to Nash or to strong coarse correlated equilibria in bimatrix games. In NeurIPS, pp.  16439–16454, 2022a.
  4. On last-iterate convergence beyond zero-sum games. In ICML, pp.  536–581, 2022b.
  5. Asynchronous gradient play in zero-sum multi-agent games. In ICLR, 2023.
  6. Aumann, R. J. Correlated equilibrium as an expression of Bayesian rationality. Econometrica: Journal of the Econometric Society, 55(1):1–18, 1987.
  7. Multiplicative weights update in zero-sum games. In EC, pp.  321–338, 2018.
  8. Fast and furious learning in zero-sum games: Vanishing regret with non-vanishing step sizes. In NeurIPS, pp.  12997–13007, 2019.
  9. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  10. The Hanabi challenge: A new frontier for AI research. Artificial Intelligence, 280:103216, 2020.
  11. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  12. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  13. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
  14. Convex Optimization. Cambridge University Press, 2004.
  15. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
  16. Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019.
  17. Deep Blue. Artificial Intelligence, 134(1-2):57–83, 2002.
  18. Hidden-role games: Equilibrium concepts and computation. arXiv preprint arXiv:2308.16017, 2023.
  19. Faster last-iterate convergence of policy optimization in zero-sum Markov games. In ICLR, 2023.
  20. Towards learning universal hyperparameter optimizers with transformers. In NeurIPS, pp.  32053–32068, 2022.
  21. Deep divergence learning. In ICML, pp.  2027–2037, 2020.
  22. Externalities, welfare, and the theory of games. Journal of Political Economy, 70(3):241–262, 1962.
  23. Is independent learning all you need in the StarCraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
  24. Learning multi-agent intention-aware communication for optimal multi-order execution in finance. In KDD, pp.  4003–4012, 2023.
  25. Coarse correlation in extensive-form games. In AAAI, pp.  1934–1941, 2020.
  26. Counterfactual multi-agent policy gradients. In AAAI, pp.  2974–2982, 2018.
  27. Bayesian action decoder for deep multi-agent reinforcement learning. In ICML, pp.  1942–1951, 2019.
  28. Tight last-iterate convergence rates for no-regret learning in multi-player games. In NeurIPS, pp.  20766–20778, 2020.
  29. A General Theory of Equilibrium Selection in Games. The MIT Press, 1988.
  30. MetaGPT: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  31. Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. In COLT, pp.  2388–2422, 2021.
  32. Black-box adversarial attacks with limited queries and information. In ICML, pp.  2137–2146, 2018.
  33. Matrix multiplicative weights updates in quantum zero-sum games: Conservation laws & recurrence. In NeurIPS, pp.  4123–4135, 2022.
  34. Let’s be honest: An optimal no-regret framework for zero-sum games. In ICML, pp.  2488–2496, 2018.
  35. Model-free learning for two-player zero-sum partially observable Markov games with perfect recall. In NeurIPS, pp.  11987–11998, 2021.
  36. Kuhn, H. W. A simplified two-person poker. Contributions to the Theory of Games, 1:97–103, 1950.
  37. Google research football: A novel reinforcement learning environment. In AAAI, pp.  4501–4510, 2020.
  38. A unified game-theoretic approach to multiagent reinforcement learning. In NeurIPS, pp.  4190–4203, 2017.
  39. OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.
  40. Population-based evaluation in repeated rock-paper-scissors as a benchmark for multiagent reinforcement learning. TMLR, 2023. ISSN 2835-8856.
  41. Last-iterate convergence in extensive-form games. In NeurIPS, pp.  14293–14305, 2021.
  42. Deal or no deal? End-to-end learning of negotiation dialogues. In EMNLP, pp.  2443–2453, 2017.
  43. SMAC3: A versatile Bayesian optimization package for hyperparameter optimization. The Journal of Machine Learning Research, 23(1):2475–2483, 2022.
  44. The power of regularization in solving extensive-form games. In ICLR, 2023.
  45. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine, 37(5):43–54, 2020.
  46. From motor control to team play in simulated humanoid football. Science Robotics, 7(69):eabo0235, 2022a.
  47. Equivalence analysis between counterfactual regret minimization and online mirror descent. In ICML, pp.  13717–13745, 2022b.
  48. Multi-agent actor-critic for mixed cooperative-competitive environments. In NeurIPS, pp.  6382–6393, 2017.
  49. Neural Bregman divergences for distance learning. In ICLR, 2023.
  50. Alympics: Language agents meet game theory – exploring strategic decision-making with AI agents. arXiv preprint arXiv:2311.03220, 2023.
  51. Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In ICML, pp.  7480–7491, 2021.
  52. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In ICLR, 2019.
  53. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  54. Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7:201–221, 1978.
  55. Nash, J. Non-cooperative games. Annals of Mathematics, 54(2):286–295, 1951.
  56. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.
  57. A Concise Introduction to Decentralized POMDPs. Springer, 2016.
  58. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378(6623):990–996, 2022.
  59. Rabin, M. Incorporating fairness into game theory and economics. The American Economic Review, 83(5):1281–1302, 1993.
  60. Linear convergence of generalized mirror descent with time-dependent mirrors. arXiv preprint arXiv:2009.08574, 2020.
  61. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML, pp.  4295–4304, 2018.
  62. Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In NeurIPS, pp.  10199–10210, 2020.
  63. Cooperative heterogeneous multi-robot systems: A survey. ACM Computing Surveys, 52(2):1–31, 2019.
  64. Ross, S. M. Goofspiel–the game of pure strategy. Journal of Applied Probability, 8(3):621–625, 1971.
  65. The StarCraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
  66. Student of games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 9(46):eadg3256, 2023.
  67. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  68. Finding friend and foe in multi-agent games. In NeurIPS, pp.  1251–1261, 2019.
  69. Multiagent Systems: Algorithmic, Game-theoretic, and Logical Foundations. Cambridge University Press, 2008.
  70. Learning to approximate a Bregman divergence. In NeurIPS, pp.  3603–3612, 2020.
  71. Mastering the game of Go without human knowledge. Nature, 550(7676):354–359, 2017.
  72. Reinforcement learning in robotic applications: A comprehensive survey. Artificial Intelligence Review, 55(2):945–990, 2022.
  73. Solving common-payoff games with approximate policy iteration. In AAAI, pp.  9695–9703, 2021.
  74. A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. In ICLR, 2023.
  75. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML, pp.  5887–5896, 2019.
  76. ES-MAML: Simple Hessian-free meta learning. In ICLR, 2020.
  77. Emvlight: A decentralized reinforcement learning framework for efficient passage of emergency vehicles. In AAAI, pp.  4593–4601, 2022.
  78. Trust region bounds for decentralized PPO under non-stationarity. In AAMAS, pp.  5–13, 2023a.
  79. Reinforcement learning for quantitative trading. ACM Transactions on Intelligent Systems and Technology, 14(3):1–29, 2023b.
  80. Reinforcement Learning: An Introduction. MIT press, 2018.
  81. Tammelin, O. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042, 2014.
  82. Tesauro, G. et al. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58–68, 1995.
  83. Mirror descent policy optimization. In ICLR, 2022.
  84. Transfer learning without knowing: Reprogramming black-box machine learning models with scarce data and limited resources. In ICML, pp.  9614–9624, 2020.
  85. v. Neumann, J. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, 1928.
  86. Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance. In COLT, pp.  65–102, 2022.
  87. QPLEX: Duplex dueling multi-agent Q-learning. In ICLR, 2021a.
  88. Multi-agent reinforcement learning for active voltage control on power distribution networks. In NeurIPS, pp.  3271–3284, 2021b.
  89. Application of deep reinforcement learning in werewolf game agents. In TAAI, pp.  28–33, 2018.
  90. ZARTS: On zero-order optimization for neural architecture search. In NeurIPS, pp.  12868–12880, 2022.
  91. Alternating mirror descent for constrained min-max games. In NeurIPS, pp.  35201–35212, 2022.
  92. Hierarchically and cooperatively learning traffic signal control. In AAAI, pp.  669–677, 2021.
  93. Fictitious cross-play: Learning global Nash equilibrium in mixed cooperative-competitive games. In AAMAS, pp.  1053–1061, 2023.
  94. Ypma, T. J. Historical development of the Newton-Raphson method. SIAM Review, 37(4):531–551, 1995.
  95. The surprising effectiveness of PPO in cooperative multi-agent games. In NeurIPS Datasets and Benchmarks Track, pp.  24611–24624, 2022.
  96. A self-tuning actor-critic algorithm. In NeurIPS, pp.  20913–20924, 2020.
  97. Learning in games with lossy feedback. In NeurIPS, pp.  5134–5144, 2018.
  98. Regret minimization in games with incomplete information. In NeurIPS, pp.  1729–1736, 2007.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com