Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback (2112.02856v4)
Abstract: We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at each time -- determined by all players' current joint action -- rather than its gradient. We focus on the class of \textit{smooth and strongly monotone} games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $\tilde{\Theta}(n\sqrt{T})$ under smooth and strongly concave reward functions ($n \geq 1$ is the problem dimension). We then show that if each player applies this no-regret learning algorithm in strongly monotone games, the joint action converges in the \textit{last iterate} to the unique Nash equilibrium at a rate of $\tilde{\Theta}(nT{-1/2})$. Prior to our work, the best-known convergence rate in the same class of games is $\tilde{O}(n{2/3}T{-1/3})$ (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is $\Omega(nT{-1/2})$). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present preliminary numerical results on several application problems to demonstrate the efficacy of our algorithm in terms of iteration count.
- Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pages 263–273, 2008.
- Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, pages 28–40, 2010.
- The multiplicative weights update method: A meta-algorithm and applications. Theory of Computing, 8(1):121–164, 2012.
- Distributed linearized alternating direction method of multipliers for composite convex consensus optimization. IEEE Transactions on Automatic Control, 63(1):5–20, 2017.
- Minimizing regret on reflexive Banach spaces and Nash equilibria in continuous zero-sum games. In NIPS, pages 154–162, 2016.
- Convex Analysis and Monotone Operator Theory in Hilbert Spaces, volume 408. Springer, 2011.
- A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
- Learning with minimal information in continuous games. Theoretical Economics, 15(4):1471–1508, 2020.
- A. Blum. Online algorithms in machine learning. In Online Algorithms, pages 306–325. Springer, 1998.
- Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
- Bandit learning in concave N-person games. In NIPS, pages 5666–5676, 2018.
- S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Machine Learning, 5(1):1–122, 2012.
- S. Bubeck and R. Eldan. Multi-scale exploration of convex functions and bandit convex optimization. In COLT, pages 583–589. PMLR, 2016.
- Bandit convex optimization: \\\backslash\sqrtt regret in one dimension. In COLT, pages 266–278. PMLR, 2015.
- Kernel-based methods for bandit convex optimization. In SOTC, pages 72–85, 2017.
- Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications-1, pages 183–221. Springer, 2010.
- Finite-time last-iterate convergence for learning in multi-player games. In NeurIPS, pages 33904–33919, 2022.
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
- G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
- Settling the complexity of computing two-player Nash equilibria. Journal of the ACM (JACM), 56(3):1–57, 2009.
- L. Condat. Fast projection onto the simplex and the ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ball. Mathematical Programming, 158(1-2):575–585, 2016.
- The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
- Training GANs with optimism. In ICLR, pages 1–30, 2018. URL https://openreview.net/forum?id=SJJySbbAZ.
- G. Debreu. A social equilibrium existence theorem. Proceedings of the National Academy of Sciences, 38(10):886–893, 1952.
- Bandit smooth convex optimization: Improving the bias-variance tradeoff. In NIPS, pages 2926–2934, 2015.
- F. Facchinei and J-S. Pang. Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Science & Business Media, 2007.
- Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pages 385–394, 2005.
- D. Fudenberg and D. K. Levine. The Theory of Learning in Games, volume 2. MIT Press, 1998.
- Tight last-iterate convergence rates for no-regret learning in multi-player games. In NeurIPS, volume 33, pages 20766–20778, 2020a.
- Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems. In COLT, pages 1758–1784. PMLR, 2020b.
- Generative adversarial nets. In NeurIPS, pages 2672–2680, 2014.
- S. Gopal and Y. Yang. Distributed training of large-scale logistic models. In ICML, pages 289–297. PMLR, 2013.
- Last-iterate convergence of optimistic gradient method for monotone variational inequalities. In NeurIPS, pages 21858–21870, 2022.
- J. Hannan. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, 21(39):97–139, 1957.
- E. Hazan. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
- E. Hazan and K. Y. Levy. Bandit convex optimization: Towards tight bounds. In NIPS, pages 784–792, 2014.
- E. Hazan and Y. Li. An optimal algorithm for bandit convex optimization. ArXiv Preprint: 1603.04350, 2016.
- Gradient-free online learning in continuous games with delayed rewards. In ICML, pages 4172–4181. PMLR, 2020.
- On the convergence of single-call stochastic extra-gradient methods. In NeurIPS, pages 6938–6948, 2019.
- A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- Rate control for communication networks: Shadow prices, proportional fairness and stability. Journal of the Operational Research society, 49(3):237–252, 1998.
- R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In NIPS, pages 697–704, 2004.
- Convergence of heterogeneous distributed learning in stochastic routing games. In Allerton, pages 480–487. IEEE, 2015.
- T. Lattimore and C. Szepesvári. Bandit Algorithms. Cambridge University Press, 2020.
- Last-iterate convergence in extensive-form games. In NeurIPS, pages 14293–14305, 2021.
- T. Liang and J. Stokes. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In AISTATS, pages 907–915. PMLR, 2019.
- Finite-time last-iterate convergence for multi-agent learning in games. In ICML, pages 6161–6171. PMLR, 2020.
- An asynchronous parallel stochastic coordinate descent algorithm. The Journal of Machine Learning Research, 16:285–322, 2015.
- A distributed block coordinate descent method for training l1-regularized linear classifiers. The Journal of Machine Learning Research, 18(1):3167–3201, 2017.
- P. Mertikopoulos and Z. Zhou. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173(1):465–507, 2019.
- Distributed stochastic optimization via matrix exponential learning. IEEE Transactions on Signal Processing, 65(9):2277–2290, 2017.
- Cycles in adversarial regularized learning. In SODA, pages 2703–2717. SIAM, 2018.
- Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile. In ICLR, pages 1–23, 2019. URL https://openreview.net/forum?id=Bkg8jjC9KQ.
- D. Monderer and L. S. Shapley. Potential games. Games and Economic Behavior, 14(1):124–143, 1996.
- Interior-point methods for optimization. Acta Numerica, 17:191–234, 2008.
- Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.
- Y. Nesterov. Lectures on Convex Optimization, volume 137. Springer, 2018.
- Y. Nesterov and A. Nemirovski. Interior-Point Polynomial Algorithms in Convex Programming. SIAM, 1994.
- Competitive routing in multiuser communication networks. IEEE/ACM Transactions on Networking, 1(5):510–521, 1993.
- J. B. Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
- A. Saha and A. Tewari. Improved regret guarantees for online smooth convex optimization with bandit feedback. In AISTATS, pages 636–642, 2011.
- W. H. Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97(1):81–108, 2001.
- W. H. Sandholm. Population games and deterministic evolutionary dynamics. In Handbook of Game Theory with Economic Applications, volume 4, pages 703–778. Elsevier, 2015.
- S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2012.
- S. Shalev-Shwartz and Y. Singer. Convex repeated games and Fenchel duality. In NeurIPS, pages 1265–1272, 2006.
- O. Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In COLT, pages 3–24. PMLR, 2013.
- Y. Shoham and K. Leyton-Brown. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008.
- S. Sorin and C. Wan. Finite composite games: Equilibria and dynamics. Journal of Dynamics and Games, 3(1):101–120, 2016.
- J. C. Spall. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica, 33(1):109–112, 1997.
- C. W. Tan. Wireless network optimization by Perron-Frobenius theory. In CISS, pages 1–6. IEEE, 2014.
- P. Tseng. On linear convergence of iterative methods for the variational inequality problem. Journal of Computational and Applied Mathematics, 60(1-2):237–252, 1995.
- Weighted Sum-Rate Maximization in Wireless Networks: A Review. Now Foundations and Trends, 2012.
- Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive Markov games. In COLT, pages 4259–4299. PMLR, 2021a.
- Linear last-iterate convergence in constrained saddle-point optimization. In ICLR, pages 1–39, 2021b. URL https://openreview.net/forum?id=dx11_7vm5_r.
- R. Zhang and J. Kwok. Asynchronous distributed ADMM for consensus optimization. In ICML, pages 1701–1709. PMLR, 2014.
- Countering feedback delays in multi-agent learning. In NIPS, pages 6172–6182, 2017a.
- Mirror descent learning in continuous games. In CDC, pages 5776–5783. IEEE, 2017b.
- Learning in games with lossy feedback. In NIPS, pages 1–11, 2018.
- Robust power management via learning and game design. Operations Research, 69(1):331–345, 2021.
- M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.
- Wenjia Ba (3 papers)
- Tianyi Lin (50 papers)
- Jiawei Zhang (529 papers)
- Zhengyuan Zhou (60 papers)