Self-Play Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma
Abstract: A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner's dilemma. We characterize broad conditions under which such agents provably learn the cooperative Pavlov (win-stay, lose-shift) policy rather than the Pareto-dominated "always defect" policy. We validate our theoretical results through additional experiments, demonstrating their robustness across a broader class of deep learning algorithms.
- Artificial intelligence, algorithm design, and pricing. In AEA Papers and Proceedings, volume 112, pages 452–56, 2022.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256, 2002.
- Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528, 2019.
- M. Banchio and G. Mantegazza. Adaptive algorithms and collusion via coupling. arXiv preprint arXiv, 2202, 2022.
- M. Banchio and A. Skrzypacz. Artificial intelligence and auction design. arXiv preprint arXiv:2202.05947, 2022.
- Competition Bureau. Big data and innovation: Implications for competition policy in canada. https://ised-isde.canada.ca/site/competition-bureau-canada/sites/default/files/attachments/2022/Big-Data-e.pdf, 2018.
- Algorithmic pricing what implications for competition policy? Review of industrial organization, 55:155–171, 2019.
- Artificial intelligence, algorithmic pricing, and collusion. American Economic Review, 110(10):3267–3297, 2020.
- Algorithmic collusion: Genuine or spurious? International Journal of Industrial Organization, page 102973, 2023.
- An empirical analysis of algorithmic pricing on amazon marketplace. In Proceedings of the 25th international conference on World Wide Web, pages 1339–1349, 2016.
- D. Cheng. Asymmetric equilibria in symmetric multiplayer prisoners dilemma supergames. arXiv preprint arXiv:2205.13772, 2022.
- Regulation, competition and productivity convergence. 2006.
- G. Debreu. Theory of value: An axiomatic analysis of economic equilibrium, volume 17. Yale University Press, 1959.
- R. Deneckere and C. Davidson. Incentives to form coalitions with bertrand competition. The RAND Journal of economics, pages 473–486, 1985.
- E. Even-Dar and Y. Mansour. Convergence of optimistic and incremental Q-learning. NeurIPS, 14, 2001.
- A. Ezrachi and M. E. Stucke. The curious case of competition and quality. Journal of Antitrust Enforcement, 3(2):227–257, 2015.
- M. Fey. Symmetric games with only asymmetric equilibria. Games and Economic Behavior, 2012.
- M. M. Flood. Some experimental games. Management Science, 5(1):5–26, 1958.
- Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. arXiv preprint arXiv:2302.09605, 2023.
- Frontiers: Algorithmic collusion: Supra-competitive prices via independent algorithms. Marketing Science, 40(1):1–12, 2021.
- J. Hu and M. P. Wellman. Multiagent reinforcement learning: theoretical framework and an algorithm. In ICML, volume 98, pages 242–250, 1998.
- M. Kaisers. Learning against learning: Evolutionary dynamics of reinforcement learning algorithms in strategic interactions. 2012.
- The iterated prisoners’ dilemma: 20 years on, volume 4. World Scientific, 2007.
- M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.
- Multi-agent actor-critic for mixed cooperative-competitive environments. NeurIPS, 2017.
- S. K. Mehra. Antitrust and the robo-seller: Competition in the time of algorithms. Minn. L. Rev., 100:1323, 2015.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- J. Nash. Non-cooperative games. In Essays on Game Theory. Edward Elgar Publishing, 1951.
- Dealing with non-stationarity in decentralized cooperative multi-agent deep reinforcement learning via multi-timescale learning. arXiv preprint arXiv:2302.02792, 2023.
- M. Nowak and K. Sigmund. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364:56–58, 1993.
- OECD. Collusion: Competition policy in the digital age. https://www.oecd.org/daf/competition/Algorithms-and-colllusion-competition-policy-in-the-digital-age.pdf, 2017.
- M. J. Osborne. An introduction to game theory, volume 3. Oxford university press New York, 2004.
- R. A. Posner. Antitrust law. University of Chicago press, 2009.
- C. M. Radaelli. The puzzle of regulatory competition. Journal of Public Policy, 24(1):1–23, 2004.
- S. Schechner. Why do gas station prices constantly change? blame the algorithm. Wall Street Journal, 8, 2017.
- A theoretical and empirical analysis of expected sarsa. In 2009 ieee symposium on adaptive dynamic programming and reinforcement learning, pages 177–184. IEEE, 2009.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
- Reinforcement learning: An introduction. MIT press, 2018.
- Y. Tang. Towards learning multi-agent negotiations via self-play. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.
- M. J . Wainwright. Stochastic approximation with cone-contractive operators: Sharp ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-bounds for Q𝑄Qitalic_Q-learning. arXiv preprint arXiv:1905.06265, 2019.
- L. Waltman and U. Kaymak. Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10):3275–3293, 2008.
- C. Watkins and P. Dayan. Q-learning. Machine learning, 8:279–292, 1992.
- S. D. Whitehead. A complexity analysis of cooperative mechanisms in reinforcement learning. In AAAI, pages 607–613, 1991.
- D. Xefteris. Symmetric zero-sum games with only asymmetric equilibria. Games and Economic Behavior, 2015.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pages 321–384, 2021.
- Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning. arXiv preprint arXiv:2303.09032, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.