Configurable Mirror Descent: Towards a Unification of Decision Making (2405.11746v1)
Abstract: Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific decision-making problems. Despite the successes in specific categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: \emph{Can we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problems?} There are several main challenges to address this question: i) different decision-making categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures. iii) We construct the \textsc{GameBench} with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.
- Avalon: A benchmark for RL generalization using procedurally generated worlds. In NeurIPS Datasets and Benchmarks Track, pp. 12813–12825, 2022.
- Input convex neural networks. In ICML, pp. 146–155, 2017.
- Optimistic mirror descent either converges to Nash or to strong coarse correlated equilibria in bimatrix games. In NeurIPS, pp. 16439–16454, 2022a.
- On last-iterate convergence beyond zero-sum games. In ICML, pp. 536–581, 2022b.
- Asynchronous gradient play in zero-sum multi-agent games. In ICLR, 2023.
- Aumann, R. J. Correlated equilibrium as an expression of Bayesian rationality. Econometrica: Journal of the Econometric Society, 55(1):1–18, 1987.
- Multiplicative weights update in zero-sum games. In EC, pp. 321–338, 2018.
- Fast and furious learning in zero-sum games: Vanishing regret with non-vanishing step sizes. In NeurIPS, pp. 12997–13007, 2019.
- Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- The Hanabi challenge: A new frontier for AI research. Artificial Intelligence, 280:103216, 2020.
- Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- Convex Optimization. Cambridge University Press, 2004.
- Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
- Superhuman AI for multiplayer poker. Science, 365(6456):885–890, 2019.
- Deep Blue. Artificial Intelligence, 134(1-2):57–83, 2002.
- Hidden-role games: Equilibrium concepts and computation. arXiv preprint arXiv:2308.16017, 2023.
- Faster last-iterate convergence of policy optimization in zero-sum Markov games. In ICLR, 2023.
- Towards learning universal hyperparameter optimizers with transformers. In NeurIPS, pp. 32053–32068, 2022.
- Deep divergence learning. In ICML, pp. 2027–2037, 2020.
- Externalities, welfare, and the theory of games. Journal of Political Economy, 70(3):241–262, 1962.
- Is independent learning all you need in the StarCraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
- Learning multi-agent intention-aware communication for optimal multi-order execution in finance. In KDD, pp. 4003–4012, 2023.
- Coarse correlation in extensive-form games. In AAAI, pp. 1934–1941, 2020.
- Counterfactual multi-agent policy gradients. In AAAI, pp. 2974–2982, 2018.
- Bayesian action decoder for deep multi-agent reinforcement learning. In ICML, pp. 1942–1951, 2019.
- Tight last-iterate convergence rates for no-regret learning in multi-player games. In NeurIPS, pp. 20766–20778, 2020.
- A General Theory of Equilibrium Selection in Games. The MIT Press, 1988.
- MetaGPT: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. In COLT, pp. 2388–2422, 2021.
- Black-box adversarial attacks with limited queries and information. In ICML, pp. 2137–2146, 2018.
- Matrix multiplicative weights updates in quantum zero-sum games: Conservation laws & recurrence. In NeurIPS, pp. 4123–4135, 2022.
- Let’s be honest: An optimal no-regret framework for zero-sum games. In ICML, pp. 2488–2496, 2018.
- Model-free learning for two-player zero-sum partially observable Markov games with perfect recall. In NeurIPS, pp. 11987–11998, 2021.
- Kuhn, H. W. A simplified two-person poker. Contributions to the Theory of Games, 1:97–103, 1950.
- Google research football: A novel reinforcement learning environment. In AAAI, pp. 4501–4510, 2020.
- A unified game-theoretic approach to multiagent reinforcement learning. In NeurIPS, pp. 4190–4203, 2017.
- OpenSpiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.
- Population-based evaluation in repeated rock-paper-scissors as a benchmark for multiagent reinforcement learning. TMLR, 2023. ISSN 2835-8856.
- Last-iterate convergence in extensive-form games. In NeurIPS, pp. 14293–14305, 2021.
- Deal or no deal? End-to-end learning of negotiation dialogues. In EMNLP, pp. 2443–2453, 2017.
- SMAC3: A versatile Bayesian optimization package for hyperparameter optimization. The Journal of Machine Learning Research, 23(1):2475–2483, 2022.
- The power of regularization in solving extensive-form games. In ICLR, 2023.
- A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine, 37(5):43–54, 2020.
- From motor control to team play in simulated humanoid football. Science Robotics, 7(69):eabo0235, 2022a.
- Equivalence analysis between counterfactual regret minimization and online mirror descent. In ICML, pp. 13717–13745, 2022b.
- Multi-agent actor-critic for mixed cooperative-competitive environments. In NeurIPS, pp. 6382–6393, 2017.
- Neural Bregman divergences for distance learning. In ICLR, 2023.
- Alympics: Language agents meet game theory – exploring strategic decision-making with AI agents. arXiv preprint arXiv:2311.03220, 2023.
- Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In ICML, pp. 7480–7491, 2021.
- Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In ICLR, 2019.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Strategically zero-sum games: The class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7:201–221, 1978.
- Nash, J. Non-cooperative games. Annals of Mathematics, 54(2):286–295, 1951.
- Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.
- A Concise Introduction to Decentralized POMDPs. Springer, 2016.
- Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378(6623):990–996, 2022.
- Rabin, M. Incorporating fairness into game theory and economics. The American Economic Review, 83(5):1281–1302, 1993.
- Linear convergence of generalized mirror descent with time-dependent mirrors. arXiv preprint arXiv:2009.08574, 2020.
- QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML, pp. 4295–4304, 2018.
- Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In NeurIPS, pp. 10199–10210, 2020.
- Cooperative heterogeneous multi-robot systems: A survey. ACM Computing Surveys, 52(2):1–31, 2019.
- Ross, S. M. Goofspiel–the game of pure strategy. Journal of Applied Probability, 8(3):621–625, 1971.
- The StarCraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
- Student of games: A unified learning algorithm for both perfect and imperfect information games. Science Advances, 9(46):eadg3256, 2023.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Finding friend and foe in multi-agent games. In NeurIPS, pp. 1251–1261, 2019.
- Multiagent Systems: Algorithmic, Game-theoretic, and Logical Foundations. Cambridge University Press, 2008.
- Learning to approximate a Bregman divergence. In NeurIPS, pp. 3603–3612, 2020.
- Mastering the game of Go without human knowledge. Nature, 550(7676):354–359, 2017.
- Reinforcement learning in robotic applications: A comprehensive survey. Artificial Intelligence Review, 55(2):945–990, 2022.
- Solving common-payoff games with approximate policy iteration. In AAAI, pp. 9695–9703, 2021.
- A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games. In ICLR, 2023.
- QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML, pp. 5887–5896, 2019.
- ES-MAML: Simple Hessian-free meta learning. In ICLR, 2020.
- Emvlight: A decentralized reinforcement learning framework for efficient passage of emergency vehicles. In AAAI, pp. 4593–4601, 2022.
- Trust region bounds for decentralized PPO under non-stationarity. In AAMAS, pp. 5–13, 2023a.
- Reinforcement learning for quantitative trading. ACM Transactions on Intelligent Systems and Technology, 14(3):1–29, 2023b.
- Reinforcement Learning: An Introduction. MIT press, 2018.
- Tammelin, O. Solving large imperfect information games using CFR+. arXiv preprint arXiv:1407.5042, 2014.
- Tesauro, G. et al. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3):58–68, 1995.
- Mirror descent policy optimization. In ICLR, 2022.
- Transfer learning without knowing: Reprogramming black-box machine learning models with scarce data and limited resources. In ICML, pp. 9614–9624, 2020.
- v. Neumann, J. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, 1928.
- Mirror descent strikes again: Optimal stochastic convex optimization under infinite noise variance. In COLT, pp. 65–102, 2022.
- QPLEX: Duplex dueling multi-agent Q-learning. In ICLR, 2021a.
- Multi-agent reinforcement learning for active voltage control on power distribution networks. In NeurIPS, pp. 3271–3284, 2021b.
- Application of deep reinforcement learning in werewolf game agents. In TAAI, pp. 28–33, 2018.
- ZARTS: On zero-order optimization for neural architecture search. In NeurIPS, pp. 12868–12880, 2022.
- Alternating mirror descent for constrained min-max games. In NeurIPS, pp. 35201–35212, 2022.
- Hierarchically and cooperatively learning traffic signal control. In AAAI, pp. 669–677, 2021.
- Fictitious cross-play: Learning global Nash equilibrium in mixed cooperative-competitive games. In AAMAS, pp. 1053–1061, 2023.
- Ypma, T. J. Historical development of the Newton-Raphson method. SIAM Review, 37(4):531–551, 1995.
- The surprising effectiveness of PPO in cooperative multi-agent games. In NeurIPS Datasets and Benchmarks Track, pp. 24611–24624, 2022.
- A self-tuning actor-critic algorithm. In NeurIPS, pp. 20913–20924, 2020.
- Learning in games with lossy feedback. In NeurIPS, pp. 5134–5144, 2018.
- Regret minimization in games with incomplete information. In NeurIPS, pp. 1729–1736, 2007.