Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma (2312.08484v1)

Published 13 Dec 2023 in cs.GT

Abstract: The deployment of machine learning systems in the market economy has triggered academic and institutional fears over potential tacit collusion between fully automated agents. Multiple recent economics studies have empirically shown the emergence of collusive strategies from agents guided by machine learning algorithms. In this work, we prove that multi-agent Q-learners playing the iterated prisoner's dilemma can learn to collude. The complexity of the cooperative multi-agent setting yields multiple fixed-point policies for $Q$-learning: the main technical contribution of this work is to characterize the convergence towards a specific cooperative policy. More precisely, in the iterated prisoner's dilemma, we show that with optimistic Q-values, any self-play Q-learner can provably learn a cooperative policy called Pavlov, also referred to as win-stay, lose-switch policy, which strongly differs from the vanilla Pareto-dominated always defect policy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Artificial intelligence, algorithm design, and pricing. In AEA Papers and Proceedings, volume 112, pages 452–56, 2022.
  2. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256, 2002.
  3. Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528, 2019.
  4. M. Banchio and G. Mantegazza. Adaptive algorithms and collusion via coupling. arXiv preprint arXiv, 2202, 2022.
  5. M. Banchio and A. Skrzypacz. Artificial intelligence and auction design. arXiv preprint arXiv:2202.05947, 2022.
  6. Competition Bureau. Big data and innovation: Implications for competition policy in canada. https://ised-isde.canada.ca/site/competition-bureau-canada/sites/default/files/attachments/2022/Big-Data-e.pdf, 2018.
  7. Algorithmic pricing what implications for competition policy? Review of industrial organization, 55:155–171, 2019.
  8. Artificial intelligence, algorithmic pricing, and collusion. American Economic Review, 110(10):3267–3297, 2020.
  9. Algorithmic collusion: Genuine or spurious? International Journal of Industrial Organization, page 102973, 2023.
  10. An empirical analysis of algorithmic pricing on amazon marketplace. In Proceedings of the 25th international conference on World Wide Web, pages 1339–1349, 2016.
  11. D. Cheng. Asymmetric equilibria in symmetric multiplayer prisoners dilemma supergames. arXiv preprint arXiv:2205.13772, 2022.
  12. Regulation, competition and productivity convergence. 2006.
  13. G. Debreu. Theory of value: An axiomatic analysis of economic equilibrium, volume 17. Yale University Press, 1959.
  14. R. Deneckere and C. Davidson. Incentives to form coalitions with bertrand competition. The RAND Journal of economics, pages 473–486, 1985.
  15. E. Even-Dar and Y. Mansour. Convergence of optimistic and incremental Q-learning. NeurIPS, 14, 2001.
  16. A. Ezrachi and M. E. Stucke. The curious case of competition and quality. Journal of Antitrust Enforcement, 3(2):227–257, 2015.
  17. M. Fey. Symmetric games with only asymmetric equilibria. Games and Economic Behavior, 2012.
  18. M. M. Flood. Some experimental games. Management Science, 5(1):5–26, 1958.
  19. Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. arXiv preprint arXiv:2302.09605, 2023.
  20. Frontiers: Algorithmic collusion: Supra-competitive prices via independent algorithms. Marketing Science, 40(1):1–12, 2021.
  21. J. Hu and M. P. Wellman. Multiagent reinforcement learning: theoretical framework and an algorithm. In ICML, volume 98, pages 242–250, 1998.
  22. M. Kaisers. Learning against learning: Evolutionary dynamics of reinforcement learning algorithms in strategic interactions. 2012.
  23. The iterated prisoners’ dilemma: 20 years on, volume 4. World Scientific, 2007.
  24. M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.
  25. Multi-agent actor-critic for mixed cooperative-competitive environments. NeurIPS, 2017.
  26. S. K. Mehra. Antitrust and the robo-seller: Competition in the time of algorithms. Minn. L. Rev., 100:1323, 2015.
  27. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  28. J. Nash. Non-cooperative games. In Essays on Game Theory. Edward Elgar Publishing, 1951.
  29. Dealing with non-stationarity in decentralized cooperative multi-agent deep reinforcement learning via multi-timescale learning. arXiv preprint arXiv:2302.02792, 2023.
  30. M. Nowak and K. Sigmund. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364:56–58, 1993.
  31. OECD. Collusion: Competition policy in the digital age. https://www.oecd.org/daf/competition/Algorithms-and-colllusion-competition-policy-in-the-digital-age.pdf, 2017.
  32. M. J. Osborne. An introduction to game theory, volume 3. Oxford university press New York, 2004.
  33. R. A. Posner. Antitrust law. University of Chicago press, 2009.
  34. C. M. Radaelli. The puzzle of regulatory competition. Journal of Public Policy, 24(1):1–23, 2004.
  35. S. Schechner. Why do gas station prices constantly change? blame the algorithm. Wall Street Journal, 8, 2017.
  36. A theoretical and empirical analysis of expected sarsa. In 2009 ieee symposium on adaptive dynamic programming and reinforcement learning, pages 177–184. IEEE, 2009.
  37. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
  38. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
  39. Reinforcement learning: An introduction. MIT press, 2018.
  40. Y. Tang. Towards learning multi-agent negotiations via self-play. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019.
  41. M. J . Wainwright. Stochastic approximation with cone-contractive operators: Sharp ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-bounds for Q𝑄Qitalic_Q-learning. arXiv preprint arXiv:1905.06265, 2019.
  42. L. Waltman and U. Kaymak. Q-learning agents in a Cournot oligopoly model. Journal of Economic Dynamics and Control, 32(10):3275–3293, 2008.
  43. C. Watkins and P. Dayan. Q-learning. Machine learning, 8:279–292, 1992.
  44. S. D. Whitehead. A complexity analysis of cooperative mechanisms in reinforcement learning. In AAAI, pages 607–613, 1991.
  45. D. Xefteris. Symmetric zero-sum games with only asymmetric equilibria. Games and Economic Behavior, 2015.
  46. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pages 321–384, 2021.
  47. Conditionally optimistic exploration for cooperative deep multi-agent reinforcement learning. arXiv preprint arXiv:2303.09032, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Quentin Bertrand (20 papers)
  2. Juan Duque (2 papers)
  3. Emilio Calvano (3 papers)
  4. Gauthier Gidel (76 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com