Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A unified stochastic approximation framework for learning in games (2206.03922v2)

Published 8 Jun 2022 in cs.GT, cs.LG, and math.OC

Abstract: We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying classes of Nash equilibria and sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that includes strict and sharp equilibria, and which leads to convergence in finite time. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods - that is, when players only observe their realized payoffs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Studies in linear and non-linear programming. Stanford University Press, 1958.
  2. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.
  3. On the rate of convergence of Bregman proximal methods in constrained variational inequalities. http://arxiv.org/abs/2211.08043, 2022.
  4. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, NY, USA, 2 edition, 2017.
  5. Benaïm, M. Vertex reinforced random walks and a conjecture of Pemantle. Annals of Probability, 25:361–392, 1997.
  6. Benaïm, M. Dynamics of stochastic approximation algorithms. In Azéma, J., Émery, M., Ledoux, M., and Yor, M. (eds.), Séminaire de Probabilités XXXIII, volume 1709 of Lecture Notes in Mathematics, pp.  1–68. Springer Berlin Heidelberg, 1999.
  7. Asymptotic pseudotrajectories and chain recurrent flows, with applications. Journal of Dynamics and Differential Equations, 8(1):141–176, 1996.
  8. Learning with minimal information in continuous games. Theoretical Economics, 15:1471–1508, 2020.
  9. Bandit learning in concave N𝑁{N}italic_N-person games. In NeurIPS ’18: Proceedings of the 32nd International Conference of Neural Information Processing Systems, 2018.
  10. Brown, G. W. Iterative solutions of games by fictitious play. In Coopmans, T. C. (ed.), Activity Analysis of Productions and Allocation, 374-376. Wiley, 1951.
  11. Prediction, Learning, and Games. Cambridge University Press, 2006.
  12. Penalty-regulated dynamics and robust learning procedures in games. Mathematics of Operations Research, 40(3):611–633, August 2015.
  13. Last-iterate convergence: Zero-sum games and constrained min-max optimization. In ITCS ’19: Proceedings of the 10th Conference on Innovations in Theoretical Computer Science, 2019.
  14. Training GANs with optimism. In ICLR ’18: Proceedings of the 2018 International Conference on Learning Representations, 2018.
  15. Debreu, G. A social equilibrium existence theorem. Proceedings of the National Academy of Sciences of the USA, 38(10):886–893, October 1952.
  16. Duflo, M. Cibles atteignables avec une probabilité positive d’après M. Benaïm. mimeo, 1997.
  17. Multi-agent online learning in time-varying games. Mathematics of Operations Research, 48(2):914–941, May 2023.
  18. On the convergence of regret minimization dynamics in concave games. In STOC ’09: Proceedings of the 41st annual ACM symposium on the Theory of Computing, pp.  523–532, New York, NY, 2009. ACM.
  19. No-regret learning and mixed Nash equilibria: They do not mix. In NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
  20. The convergence rate of regularized learning in games: From bandits and uncertainty to optimism and beyond. In NeurIPS ’21: Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021.
  21. On the convergence of policy gradient methods to Nash equilibria in general stochastic games. In NeurIPS ’22: Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022.
  22. A variational inequality perspective on generative adversarial networks. In ICLR ’19: Proceedings of the 2019 International Conference on Learning Representations, 2019.
  23. Martingale Limit Theory and Its Application. Probability and Mathematical Statistics. Academic Press, New York, 1980.
  24. Uncoupled dynamics do not lead to Nash equilibrium. American Economic Review, 93(5):1830–1836, 2003.
  25. Stochastic uncoupled dynamics and Nash equilibrium. Games and Economic Behavior, 57:286–303, 2006.
  26. Learning with bandit feedback in potential games. In NIPS ’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
  27. Gradient-free online learning in continuous games with delayed rewards. In ICML ’20: Proceedings of the 37th International Conference on Machine Learning, 2020.
  28. Fundamentals of Convex Analysis. Springer, Berlin, 2001.
  29. On the global convergence of stochastic fictitious play. Econometrica, 70(6):2265–2294, November 2002.
  30. Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4):479–519, July 2003.
  31. On the convergence of single-call stochastic extra-gradient methods. In NeurIPS ’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp.  6936–6946, 2019.
  32. Explore aggressively, update conservatively: Stochastic extragradient methods with variable stepsize scaling. In NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
  33. The limits of min-max optimization algorithms: Convergence to spurious non-critical sets. In ICML ’21: Proceedings of the 38th International Conference on Machine Learning, 2021.
  34. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
  35. Rate control for communication networks: shadow prices, proportional fairness and stability. Journal of the Operational Research Society, 49(3):237–252, March 1998.
  36. Korpelevich, G. M. The extragradient method for finding saddle points and other problems. Èkonom. i Mat. Metody, 12:747–756, 1976.
  37. Stochastic approximation algorithms and applications. Springer-Verlag, New York, NY, 1997.
  38. Bandit Algorithms. Cambridge University Press, Cambridge, UK, 2020.
  39. Individual Q𝑄Qitalic_Q-learning in normal form games. SIAM Journal on Control and Optimization, 44(2):495–514, 2005.
  40. Generalised weakened fictitious play. Games and Economic Behavior, 56(2):285–298, August 2006.
  41. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
  42. The logic of animal conflict. Nature, 246:15–18, November 1973.
  43. Learning in games via reinforcement and regularization. Mathematics of Operations Research, 41(4):1297–1324, November 2016.
  44. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173(1-2):465–507, January 2019.
  45. Cycles in adversarial regularized learning. In SODA ’18: Proceedings of the 29th annual ACM-SIAM Symposium on Discrete Algorithms, 2018.
  46. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. In ICLR ’19: Proceedings of the 2019 International Conference on Learning Representations, 2019.
  47. On the almost sure convergence of stochastic gradient descent in non-convex problems. In NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
  48. Potential games. Games and Economic Behavior, 14(1):124 – 143, 1996.
  49. Problem Complexity and Method Efficiency in Optimization. Wiley, New York, NY, 1983.
  50. Nesterov, Y. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221–259, 2009.
  51. Stochastic Approximation and Recursive Estimation. American Mathematical Society, Providence, RI, 1976.
  52. Nash equilibrium seeking with arbitrarily delayed player actions. In CDC ’20: Proceedings of the 59th IEEE Annual Conference on Decision and Control, 2019.
  53. Polyak, B. T. Introduction to Optimization. Optimization Software, New York, NY, USA, 1987.
  54. Popov, L. D. A modification of the Arrow–Hurwicz method for search of saddle points. Mathematical Notes of the Academy of Sciences of the USSR, 28(5):845–848, 1980.
  55. Optimization, learning, and games with predictable sequences. In NIPS ’13: Proceedings of the 27th International Conference on Neural Information Processing Systems, 2013.
  56. On the characterization of local Nash equilibria in continuous games. IEEE Trans. Autom. Control, 61(8):2301–2307, August 2016.
  57. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.
  58. Robinson, J. An iterative method for solving a game. Annals of Mathematics, 54:296–301, 1951.
  59. Rosen, J. B. Existence and uniqueness of equilibrium points for concave N𝑁{N}italic_N-person games. Econometrica, 33(3):520–534, 1965.
  60. Rosenthal, R. W. A class of games possessing pure-strategy Nash equilibria. International Journal of Game Theory, 2:65–67, 1973.
  61. Evolutionary stability in asymmetric games. Journal of Economic Theory, 57:363–391, 1992.
  62. Convex optimization, game theory, and variational inequality theory in multiuser communication systems. IEEE Signal Process. Mag., 27(3):35–49, May 2010.
  63. Convex repeated games and Fenchel duality. In NIPS’ 06: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp.  1265–1272. MIT Press, 2006.
  64. Spall, J. C. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control, 37(3):332–341, March 1992.
  65. Fast convergence of regularized learning in games. In NIPS ’15: Proceedings of the 29th International Conference on Neural Information Processing Systems, pp.  2989–2997, 2015.
  66. Learning generalized Nash equilibria in a class of convex games. IEEE Trans. Autom. Control, 64(4):1426–1439, 2019a.
  67. Learning Nash equilibria in monotone games. In CDC ’19: Proceedings of the 58th IEEE Annual Conference on Decision and Control, 2019b. doi: 10.1109/CDC40024.2019.9029659.
  68. Evolutionary stable strategies and game dynamics. Mathematical Biosciences, 40(1-2):145–156, 1978.
  69. Tullock, G. Efficient rent seeking. In Tollison, J. M. B. R. D. and Tullock, G. (eds.), Toward a theory of the rent-seeking society. Texas A&M University Press, 1980.
  70. Vovk, V. G. Aggregating strategies. In COLT ’90: Proceedings of the 3rd Workshop on Computational Learning Theory, pp.  371–383, 1990.
  71. Gradient play in multi-agent Markov stochastic games: Stationary points and convergence. https://arxiv.org/abs/2106.00198, 2021.
Citations (12)

Summary

We haven't generated a summary for this paper yet.