Memory Asymmetry Creates Heteroclinic Orbits to Nash Equilibrium in Learning in Zero-Sum Games (2305.13619v4)
Abstract: Learning in games considers how multiple agents maximize their own rewards through repeated games. Memory, an ability that an agent changes his/her action depending on the history of actions in previous games, is often introduced into learning to explore more clever strategies and discuss the decision-making of real agents like humans. However, such games with memory are hard to analyze because they exhibit complex phenomena like chaotic dynamics or divergence from Nash equilibrium. In particular, how asymmetry in memory capacities between agents affects learning in games is still unclear. In response, this study formulates a gradient ascent algorithm in games with asymmetry memory capacities. To obtain theoretical insights into learning dynamics, we first consider a simple case of zero-sum games. We observe complex behavior, where learning dynamics draw a heteroclinic connection from unstable fixed points to stable ones. Despite this complexity, we analyze learning dynamics and prove local convergence to these stable fixed points, i.e., the Nash equilibria. We identify the mechanism driving this convergence: an agent with a longer memory learns to exploit the other, which in turn endows the other's utility function with strict concavity. We further numerically observe such convergence in various initial strategies, action numbers, and memory lengths. This study reveals a novel phenomenon due to memory asymmetry, providing fundamental strides in learning in games and new insights into computing equilibria.
- Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games. In UAI, 1–10.
- On last-iterate convergence beyond zero-sum games. In ICML, 536–581.
- The evolution of cooperation. Science, 211(4489): 1390–1396.
- Comparing reactive and memory-one strategies of direct reciprocity. Scientific reports, 6(1): 25676.
- Barfuss, W. 2020. Reinforcement learning dynamics in the infinite memory limit. In AAMAS, 1768–1770.
- Deterministic limit of temporal difference reinforcement learning for stochastic games. Physical Review E, 99(4): 043305.
- Learning through reinforcement and replicator dynamics. Journal of Economic Theory, 77(1): 1–14.
- Bowling, M. 2004. Convergence and no-regret in multiagent learning. In NeurIPS, 209–216.
- Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2): 215–250.
- Last-iterate convergence: Zero-sum games and constrained min-max optimization. In ITCS, 27:1–27:18.
- The theory of learning in games, volume 2. MIT press.
- The folk theorem in repeated games with discounting or with incomplete information. In A long-run collaboration on long-run games, 209–230. World Scientific.
- Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium. In IJCAI.
- Emergence of exploitation as symmetry breaking in iterated prisoner’s dilemma. Physical Review Research, 1(3): 033077.
- Exploitation by asymmetry of information reference in coevolutionary learning in prisoner’s dilemma game. Journal of Physics: Complexity, 2(4): 045007.
- Tight last-iterate convergence rates for no-regret learning in multi-player games. In NeurIPS, 20766–20778.
- Memory-n strategies of direct reciprocity. Proceedings of the National Academy of Sciences, 114(18): 4715–4720.
- Evolutionary games and population dynamics. Cambridge university press.
- Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes. In AISTATS, 1441–1449.
- Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile. In ICLR.
- Cycles in adversarial regularized learning. In SODA, 2703–2717.
- Learning in games via reinforcement and regularization. Mathematics of Operations Research, 41(4): 1297–1324.
- Limiting dynamics for Q-learning with memory one in symmetric two-player, two-action games. Complexity, 2022.
- Five rules for friendly rivalry in direct reciprocity. Scientific reports, 10(1): 16904.
- Nash Jr, J. F. 1950. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36(1): 48–49.
- Last round convergence and no-dynamic regret in asymmetric repeated games. In ALT, 553–577.
- Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences, 109(26): 10409–10413.
- Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37(1-2): 147–166.
- Chaos in learning a simple two-person game. Proceedings of the National Academy of Sciences, 99(7): 4748–4751.
- Direct reciprocity between individuals that use different strategy spaces. PLoS Computational Biology, 18(6): e1010149.
- Nash Convergence of Gradient Dynamics in General-Sum Games. In UAI, 541–548.
- Strogatz, S. H. 2018. Nonlinear dynamics and chaos with student solutions manual: With applications to physics, biology, chemistry, and engineering. CRC press.
- Ueda, M. 2023. Memory-two strategies forming symmetric mutual reinforcement learning equilibrium in repeated prisoners’ dilemma game. Applied Mathematics and Computation, 444: 127819.
- Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma. Applied Mathematics and Computation, 409: 126370.
- Linear Last-iterate Convergence in Constrained Saddle-point Optimization. In ICLR.
- Zinkevich, M. 2003. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 928–936.