Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback (2112.02856v4)

Published 6 Dec 2021 in cs.LG, cs.GT, and math.OC

Abstract: We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at each time -- determined by all players' current joint action -- rather than its gradient. We focus on the class of \textit{smooth and strongly monotone} games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $\tilde{\Theta}(n\sqrt{T})$ under smooth and strongly concave reward functions ($n \geq 1$ is the problem dimension). We then show that if each player applies this no-regret learning algorithm in strongly monotone games, the joint action converges in the \textit{last iterate} to the unique Nash equilibrium at a rate of $\tilde{\Theta}(nT{-1/2})$. Prior to our work, the best-known convergence rate in the same class of games is $\tilde{O}(n{2/3}T{-1/3})$ (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is $\Omega(nT{-1/2})$). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present preliminary numerical results on several application problems to demonstrate the efficacy of our algorithm in terms of iteration count.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pages 263–273, 2008.
  2. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, pages 28–40, 2010.
  3. The multiplicative weights update method: A meta-algorithm and applications. Theory of Computing, 8(1):121–164, 2012.
  4. Distributed linearized alternating direction method of multipliers for composite convex consensus optimization. IEEE Transactions on Automatic Control, 63(1):5–20, 2017.
  5. Minimizing regret on reflexive Banach spaces and Nash equilibria in continuous zero-sum games. In NIPS, pages 154–162, 2016.
  6. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, volume 408. Springer, 2011.
  7. A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  8. Learning with minimal information in continuous games. Theoretical Economics, 15(4):1471–1508, 2020.
  9. A. Blum. Online algorithms in machine learning. In Online Algorithms, pages 306–325. Springer, 1998.
  10. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
  11. Bandit learning in concave N-person games. In NIPS, pages 5666–5676, 2018.
  12. S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Machine Learning, 5(1):1–122, 2012.
  13. S. Bubeck and R. Eldan. Multi-scale exploration of convex functions and bandit convex optimization. In COLT, pages 583–589. PMLR, 2016.
  14. Bandit convex optimization: \\\backslash\sqrtt regret in one dimension. In COLT, pages 266–278. PMLR, 2015.
  15. Kernel-based methods for bandit convex optimization. In SOTC, pages 72–85, 2017.
  16. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications-1, pages 183–221. Springer, 2010.
  17. Finite-time last-iterate convergence for learning in multi-player games. In NeurIPS, pages 33904–33919, 2022.
  18. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
  19. G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
  20. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM (JACM), 56(3):1–57, 2009.
  21. L. Condat. Fast projection onto the simplex and the ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ball. Mathematical Programming, 158(1-2):575–585, 2016.
  22. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
  23. Training GANs with optimism. In ICLR, pages 1–30, 2018. URL https://openreview.net/forum?id=SJJySbbAZ.
  24. G. Debreu. A social equilibrium existence theorem. Proceedings of the National Academy of Sciences, 38(10):886–893, 1952.
  25. Bandit smooth convex optimization: Improving the bias-variance tradeoff. In NIPS, pages 2926–2934, 2015.
  26. F. Facchinei and J-S. Pang. Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Science & Business Media, 2007.
  27. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pages 385–394, 2005.
  28. D. Fudenberg and D. K. Levine. The Theory of Learning in Games, volume 2. MIT Press, 1998.
  29. Tight last-iterate convergence rates for no-regret learning in multi-player games. In NeurIPS, volume 33, pages 20766–20778, 2020a.
  30. Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems. In COLT, pages 1758–1784. PMLR, 2020b.
  31. Generative adversarial nets. In NeurIPS, pages 2672–2680, 2014.
  32. S. Gopal and Y. Yang. Distributed training of large-scale logistic models. In ICML, pages 289–297. PMLR, 2013.
  33. Last-iterate convergence of optimistic gradient method for monotone variational inequalities. In NeurIPS, pages 21858–21870, 2022.
  34. J. Hannan. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, 21(39):97–139, 1957.
  35. E. Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
  36. E. Hazan and K. Y. Levy. Bandit convex optimization: Towards tight bounds. In NIPS, pages 784–792, 2014.
  37. E. Hazan and Y. Li. An optimal algorithm for bandit convex optimization. ArXiv Preprint: 1603.04350, 2016.
  38. Gradient-free online learning in continuous games with delayed rewards. In ICML, pages 4172–4181. PMLR, 2020.
  39. On the convergence of single-call stochastic extra-gradient methods. In NeurIPS, pages 6938–6948, 2019.
  40. A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  41. Rate control for communication networks: Shadow prices, proportional fairness and stability. Journal of the Operational Research society, 49(3):237–252, 1998.
  42. R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In NIPS, pages 697–704, 2004.
  43. Convergence of heterogeneous distributed learning in stochastic routing games. In Allerton, pages 480–487. IEEE, 2015.
  44. T. Lattimore and C. Szepesvári. Bandit Algorithms. Cambridge University Press, 2020.
  45. Last-iterate convergence in extensive-form games. In NeurIPS, pages 14293–14305, 2021.
  46. T. Liang and J. Stokes. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In AISTATS, pages 907–915. PMLR, 2019.
  47. Finite-time last-iterate convergence for multi-agent learning in games. In ICML, pages 6161–6171. PMLR, 2020.
  48. An asynchronous parallel stochastic coordinate descent algorithm. The Journal of Machine Learning Research, 16:285–322, 2015.
  49. A distributed block coordinate descent method for training l1-regularized linear classifiers. The Journal of Machine Learning Research, 18(1):3167–3201, 2017.
  50. P. Mertikopoulos and Z. Zhou. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173(1):465–507, 2019.
  51. Distributed stochastic optimization via matrix exponential learning. IEEE Transactions on Signal Processing, 65(9):2277–2290, 2017.
  52. Cycles in adversarial regularized learning. In SODA, pages 2703–2717. SIAM, 2018.
  53. Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile. In ICLR, pages 1–23, 2019. URL https://openreview.net/forum?id=Bkg8jjC9KQ.
  54. D. Monderer and L. S. Shapley. Potential games. Games and Economic Behavior, 14(1):124–143, 1996.
  55. Interior-point methods for optimization. Acta Numerica, 17:191–234, 2008.
  56. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.
  57. Y. Nesterov. Lectures on Convex Optimization, volume 137. Springer, 2018.
  58. Y. Nesterov and A. Nemirovski. Interior-Point Polynomial Algorithms in Convex Programming. SIAM, 1994.
  59. Competitive routing in multiuser communication networks. IEEE/ACM Transactions on Networking, 1(5):510–521, 1993.
  60. J. B. Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
  61. A. Saha and A. Tewari. Improved regret guarantees for online smooth convex optimization with bandit feedback. In AISTATS, pages 636–642, 2011.
  62. W. H. Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81–108, 2001.
  63. W. H. Sandholm. Population games and deterministic evolutionary dynamics. In Handbook of Game Theory with Economic Applications, volume 4, pages 703–778. Elsevier, 2015.
  64. S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2012.
  65. S. Shalev-Shwartz and Y. Singer. Convex repeated games and Fenchel duality. In NeurIPS, pages 1265–1272, 2006.
  66. O. Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In COLT, pages 3–24. PMLR, 2013.
  67. Y. Shoham and K. Leyton-Brown. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008.
  68. S. Sorin and C. Wan. Finite composite games: Equilibria and dynamics. Journal of Dynamics and Games, 3(1):101–120, 2016.
  69. J. C. Spall. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica, 33(1):109–112, 1997.
  70. C. W. Tan. Wireless network optimization by Perron-Frobenius theory. In CISS, pages 1–6. IEEE, 2014.
  71. P. Tseng. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1-2):237–252, 1995.
  72. Weighted Sum-Rate Maximization in Wireless Networks: A Review. Now Foundations and Trends, 2012.
  73. Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive Markov games. In COLT, pages 4259–4299. PMLR, 2021a.
  74. Linear last-iterate convergence in constrained saddle-point optimization. In ICLR, pages 1–39, 2021b. URL https://openreview.net/forum?id=dx11_7vm5_r.
  75. R. Zhang and J. Kwok. Asynchronous distributed ADMM for consensus optimization. In ICML, pages 1701–1709. PMLR, 2014.
  76. Countering feedback delays in multi-agent learning. In NIPS, pages 6172–6182, 2017a.
  77. Mirror descent learning in continuous games. In CDC, pages 5776–5783. IEEE, 2017b.
  78. Learning in games with lossy feedback. In NIPS, pages 1–11, 2018.
  79. Robust power management via learning and game design. Operations Research, 69(1):331–345, 2021.
  80. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wenjia Ba (3 papers)
  2. Tianyi Lin (50 papers)
  3. Jiawei Zhang (529 papers)
  4. Zhengyuan Zhou (60 papers)
Citations (9)
X Twitter Logo Streamline Icon: https://streamlinehq.com