Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms (2111.05486v3)

Published 10 Nov 2021 in cs.GT, cs.LG, and cs.MA

Abstract: Under the uncoupled learning setup, the last-iterate convergence guarantee towards Nash equilibrium is shown to be impossible in many games. This work studies the last-iterate convergence guarantee in general games toward rationalizability, a key solution concept in epistemic game theory that relaxes the stringent belief assumptions in both Nash and correlated equilibrium. This learning task naturally generalizes best arm identification problems, due to the intrinsic connections between rationalizable action profiles and the elimination of iteratively dominated actions. Despite a seemingly simple task, our first main result is a surprisingly negative one; that is, a large and natural class of no regret algorithms, including the entire family of Dual Averaging algorithms, provably take exponentially many rounds to reach rationalizability. Moreover, algorithms with the stronger no swap regret also suffer similar exponential inefficiency. To overcome these barriers, we develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards (termed Exp3-DH); Exp3-DH gradually forgets history at carefully tailored rates. We prove that when all agents run Exp3-DH (a.k.a., self-play in multi-agent learning), all iteratively dominated actions can be eliminated within polynomially many rounds. Our experimental results further demonstrate the efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those developed specifically for learning in games, fail to reach rationalizability efficiently.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (86)
  1. D. Abreu and H. Matsushima. Virtual implementation in iteratively undominated strategies: complete information. Econometrica: Journal of the Econometric Society, pages 993–1008, 1992.
  2. Corralling a band of bandit algorithms. In Conference on Learning Theory, pages 12–38. PMLR, 2017.
  3. G. A. Akerlof. The market for “lemons”: Quality uncertainty and the market mechanism. In Uncertainty in economics, pages 235–251. Elsevier, 1978.
  4. Dominance solvability in random games, 2021.
  5. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 736–749, 2022a.
  6. Uncoupled learning dynamics with O⁢(log⁡T)𝑂𝑇O(\log T)italic_O ( roman_log italic_T ) swap regret in multiplayer games. In Advances in Neural Information Processing Systems, 2022b.
  7. On last-iterate convergence beyond zero-sum games. In International Conference on Machine Learning, pages 536–581. PMLR, 2022c.
  8. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
  9. R. Aumann and A. Brandenburger. Epistemic conditions for Nash equilibrium. Econometrica: Journal of the Econometric Society, pages 1161–1180, 1995.
  10. R. J. Aumann. Agreeing to disagree. The Annals of Statistics, 4(6):1236–1239, 1976.
  11. R. J. Aumann. Correlated equilibrium as an expression of bayesian rationality. Econometrica: Journal of the Econometric Society, pages 1–18, 1987.
  12. R. Axelrod and W. D. Hamilton. The evolution of cooperation. science, 211(4489):1390–1396, 1981.
  13. Y. Azrieli and D. Levin. Dominance-solvable common-value large auctions. Games and Economic Behavior, 73(2):301–309, 2011.
  14. B. D. Bernheim. Rationalizable strategic behavior. Econometrica: Journal of the Econometric Society, pages 1007–1028, 1984.
  15. J. Bertrand. Review of “theorie mathematique de la richesse sociale” and of “recherches sur les principles mathematiques de la theorie des richesses.”. Journal de savants, 67:499, 1883.
  16. A. Blum and Y. Mansour. From external to internal regret. In International Conference on Computational Learning Theory, pages 621–636. Springer, 2005.
  17. T. Börgers and M. C. Janssen. On the dominance solvability of large cournot games. Games and Economic Behavior, 8(2):297–321, 1995.
  18. A. Brandenburger and E. Dekel. Rationalizability and correlated equilibria. Econometrica: Journal of the Econometric Society, pages 1391–1402, 1987.
  19. Bandit learning in concave n-person games. In Advances in Neural Information Processing Systems, pages 5661–5671, 2018.
  20. G. W. Brown. Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13(1):374–376, 1951.
  21. N. Brown and T. Sandholm. Solving imperfect-information games via discounted regret minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1829–1836, 2019.
  22. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23–37. Springer, 2009.
  23. Kernel-based methods for bandit convex optimization. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 72–85, 2017.
  24. Finite-time last-iterate convergence for learning in multi-player games. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 33904–33919. Curran Associates, Inc., 2022.
  25. Uncoupled and convergent learning in two-player zero-sum markov games. arXiv preprint arXiv:2303.02738, 2023.
  26. H. Carlsson and E. Van Damme. Global games and equilibrium selection. Econometrica: Journal of the Econometric Society, pages 989–1018, 1993.
  27. X. Chen and B. Peng. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems, 33:18990–18999, 2020.
  28. Saddle-point dynamics: conditions for asymptotic stability of saddle points. SIAM Journal on Control and Optimization, 55(1):486–511, 2017.
  29. Hedging under uncertainty: regret minimization meets exponentially fast convergence. In International Symposium on Algorithmic Game Theory, pages 252–263. Springer, 2017a.
  30. Learning with bandit feedback in potential games. In Proceedings of the 31th International Conference on Neural Information Processing Systems, 2017b.
  31. A. A. Cournot. Recherches sur les principes mathématiques de la théorie des richesses. L. Hachette, 1838.
  32. C. Daskalakis and I. Panageas. The limit points of (optimistic) gradient descent in min-max optimization. In Advances in Neural Information Processing Systems, pages 9236–9246, 2018.
  33. Near-optimal no-regret algorithms for zero-sum games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages 235–254. SIAM, 2011.
  34. Training gans with optimism. In International Conference on Learning Representations, 2018.
  35. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34, 2021.
  36. E. Dekel and M. Siniscalchi. Epistemic game theory. In Handbook of Game Theory with Economic Applications, volume 4, pages 619–702. Elsevier, 2015.
  37. Interim correlated rationalizability. Theoretical Economics, 2007.
  38. Bank runs, deposit insurance, and liquidity. Journal of political economy, 91(3):401–419, 1983.
  39. J. C. Ely and M. Pęski. Hierarchies of belief and interim rationalizability. Theoretical Economics, 1(1):19–65, 2006.
  40. Tarski’s theorem, supermodular games, and the complexity of equilibria. In 11th Innovations in Theoretical Computer Science Conference, 2020.
  41. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
  42. Near-optimal no-regret learning dynamics for general convex games. Advances in Neural Information Processing Systems, 35:39076–39089, 2022.
  43. Convergence analysis of no-regret bidding algorithms in repeated auctions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 5399–5406, 2021.
  44. Decentralized matching with aligned preferences. Technical report, Working paper, Department of Economics, Princeton University.[1225], 2020.
  45. D. Foster and H. P. Young. Regret testing: Learning to play Nash equilibrium without knowing you have an opponent. Theoretical Economics, 1(3):341–367, 2006.
  46. Learning in games: Robustness of fast convergence. Advances in Neural Information Processing Systems, 29, 2016.
  47. D. P. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29(1-2):7–35, 1999.
  48. Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1-2):79–103, 1999.
  49. D. Fudenberg and A. Liang. Predicting and understanding initial play. American Economic Review, 109(12):4112–41, 2019.
  50. D. Fudenberg and A. Peysakhovich. Recency, records, and recaps: Learning and nonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and Computation (TEAC), 4(4):1–18, 2016.
  51. D. Gale. A theory of n-person games with perfect information. Proceedings of the National Academy of Sciences of the United States of America, 39(6):496, 1953.
  52. D. Gale and L. S. Shapley. College admissions and the stability of marriage. The American Mathematical Monthly, 69(1):9–15, 1962.
  53. J. Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.
  54. J. C. Harsanyi. Games with incomplete information played by “Bayesian” players, i–iii part i. the basic model. Management science, 14(3):159–182, 1967.
  55. S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.
  56. S. Hart and A. Mas-Colell. Uncoupled dynamics do not lead to Nash equilibrium. American Economic Review, 93(5):1830–1836, 2003.
  57. Is Q-learning provably efficient? In Advances in Neural Information Processing Systems, pages 4863–4873, 2018.
  58. R. Laraki and P. Mertikopoulos. Higher order game dynamics. Journal of Economic Theory, 148(6):2666–2695, 2013.
  59. Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs. Advances in Neural Information Processing Systems, 33, 2020.
  60. On gradient-based learning in continuous games. SIAM Journal on Mathematics of Data Science, 2(1):103–131, 2020.
  61. P. Mertikopoulos and W. H. Sandholm. Learning in games via reinforcement and regularization. Mathematics of Operations Research, 41(4):1297–1324, 2016.
  62. P. Mertikopoulos and Z. Zhou. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173(1):465–507, 2019.
  63. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In International Conference on Learning Representations, 2018.
  64. P. Milgrom and J. Roberts. Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica: Journal of the Econometric Society, pages 1255–1277, 1990.
  65. H. Moulin. Dominance solvable voting schemes. Econometrica: Journal of the Econometric Society, pages 1337–1351, 1979.
  66. Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120(1):221–259, 2009.
  67. G. Neu. Explore no more: Improved high-probability regret bounds for non-stochastic bandits. In Advances on Neural Information Processing Systems 28 (NIPS 2015), pages 3150–3158, 2015.
  68. M. Niederle and L. Yariv. Decentralized matching with aligned preferences. Technical report, National Bureau of Economic Research, 2009.
  69. M. J. Osborne and A. Rubinstein. A course in game theory. MIT press, 1994.
  70. D. G. Pearce. Rationalizable strategic behavior and the problem of perfection. Econometrica: Journal of the Econometric Society, pages 1029–1050, 1984.
  71. Learning efficient Nash equilibria in distributed systems. Games and Economic behavior, 75(2):882–897, 2012.
  72. M. Rabin. Incorporating fairness into game theory and economics. The American economic review, pages 1281–1302, 1993.
  73. H. Raiffa and R. D. Luce. Games and Decisions: Introduction and Critical Survey. John Wiley, New York, 1957.
  74. A. Rakhlin and K. Sridharan. Optimization, learning, and games with predictable sequences. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2, pages 3066–3074, 2013.
  75. S. Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and trends in Machine Learning, 4(2):107–194, 2011.
  76. G. Stoltz and G. Lugosi. Internal regret in on-line portfolio selection. Machine Learning, 59(1):125–159, 2005.
  77. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems, 28:2989–2997, 2015.
  78. A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific journal of Mathematics, 5(2):285–309, 1955.
  79. D. M. Topkis. Equilibrium points in nonzero-sum n-person submodular games. Siam Journal on control and optimization, 17(6):773–787, 1979.
  80. Y. Viossat. Is having a unique equilibrium robust? Journal of Mathematical Economics, 44(11):1152–1160, 2008.
  81. Y. Viossat. Evolutionary dynamics and dominated strategies. Economic Theory Bulletin, 3(1):91–113, 2015.
  82. Y. Viossat and A. Zapechelnyuk. No-regret dynamics and fictitious play. Journal of Economic Theory, 148(2):825–842, 2013.
  83. J. Von Neumann and O. Morgenstern. Theory of games and economic behavior (commemorative edition). Princeton university press, 2007.
  84. Learning rationalizable equilibria in multiplayer games. In The Eleventh International Conference on Learning Representations, 2023.
  85. J. R. Wright and K. Leyton-Brown. Beyond equilibrium: Predicting human behavior in normal-form games. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
  86. L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11(88):2543–2596, 2010.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com