Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $O(1/k)$ Finite-Sample Complexity (2401.12764v3)
Abstract: This paper proposes to develop a new variant of the two-time-scale stochastic approximation to find the roots of two coupled nonlinear operators, assuming only noisy samples of these operators can be observed. Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples. The estimated values of these averaging steps will then be used in the two-time-scale stochastic approximation updates to find the desired solution. Our main theoretical result is to show that under the strongly monotone condition of the underlying nonlinear operators the mean-squared errors of the iterates generated by the proposed method converge to zero at an optimal rate $O(1/k)$, where $k$ is the number of iterations. Our result significantly improves the existing result of two-time-scale stochastic approximation, where the best known finite-time convergence rate is $O(1/k{2/3})$. We illustrate this result by applying the proposed method to develop new reinforcement learning algorithms with improved performance.
- H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400–407, 1951.
- V. R. Konda and J. N. Tsitsiklis, “Convergence rate of linear two-time-scale stochastic approximation,” The Annals of Applied Probability, vol. 14, no. 2, pp. 796–819, 2004.
- A. Mokkadem and M. Pelletier, “Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms,” The Annals of Applied Probability, vol. 16, no. 3, pp. 1671–1702, 2006.
- J. Hu, V. Doshi, and D. Y. Eun, “Central limit theorem for two-timescale stochastic approximation with markovian noise: Theory and applications,” To appear in AISTAT Artificial Intelligence and Statistics (AISTATS). arXiv preprint arXiv:2401.09339, 2024.
- M. Kaledin, E. Moulines, A. Naumov, V. Tadic, and H.-T. Wai, “Finite time analysis of linear two-timescale stochastic approximation with Markovian noise,” in Proceedings of Thirty Third Conference on Learning Theory, vol. 125, 2020, pp. 2144–2203.
- S. U. Haque, S. Khodadadian, and S. T. Maguluri, “Tight finite time bounds of two-time-scale linear stochastic approximation with markovian noise,” arXiv preprint arXiv:2401.00364, 2023.
- T. T. Doan, “Nonlinear two-time-scale stochastic approximation: Convergence and finite-time performance,” IEEE Transactions on Automatic Control, vol. 68, no. 8, pp. 4695–4705, 2023.
- Y. Han, X. Li, and Z. Zhang, “Finite-time decoupled convergence in nonlinear two-time-scale stochastic approximation,” arXiv preprint arXiv:2401.03893, 2024.
- L. Bottou, F. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAM Review, vol. 60, no. 2, pp. 223–311, 2018.
- J. Bhandari, D. Russo, and R. Singal, “A finite time analysis of temporal difference learning with linear function approximation,” in COLT, 2018.
- B. Karimi, B. Miasojedow, E. Moulines, and H.-T. Wai, “Non-asymptotic analysis of biased stochastic approximation scheme,” in Conference on Learning Theory, COLT 2019, 25-28 June 2019, Phoenix, AZ, USA, 2019, pp. 1944–1974.
- R. Srikant and L. Ying, “Finite-time error bounds for linear stochastic approximation and TD learning,” in COLT, 2019.
- B. Hu and U. Syed, “Characterizing the exact behaviors of temporal difference learning algorithms using markov jump linear system theory,” in Advances in Neural Information Processing Systems 32, 2019.
- S. Chen, A. Devraj, A. Busic, and S. Meyn, “Explicit mean-square error bounds for monte-carlo and linear stochastic approximation,” ser. Proceedings of Machine Learning Research, vol. 108, 26–28 Aug 2020, pp. 4173–4183.
- Z. Chen, S. Zhang, T. T. Doan, S. T. Maguluri, and J.-P. Clarke, “Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis,” Available at: https://arxiv.org/abs/1905.11425, 2019.
- G. Dalal, G. Thoppe, B. Szörényi, and S. Mannor, “Finite sample analysis of two-timescale stochastic approximation with applications to reinforcement learning,” in COLT, 2018.
- T. T. Doan and J. Romberg, “Linear two-time-scale stochastic approximation a finite-time analysis,” in 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2019, pp. 399–406.
- H. Gupta, R. Srikant, and L. Ying, “Finite-time performance bounds and adaptive learning rate selection for two time-scale reinforcement learning,” in Advances in Neural Information Processing Systems, 2019.
- T. T. Doan, “Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation,” SIAM Journal on Control and Optimization, vol. 59, no. 4, pp. 2798–2819, 2021.
- G. Dalal, B. Szorenyi, and G. Thoppe, “A tale of two-timescale reinforcement learning with the tightest finite-time bound,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 3701–3708, Apr. 2020.
- S. Zeng, T. T. Doan, and J. Romberg, “A two-time-scale stochastic optimization framework with applications in control and reinforcement learning,” arXiv preprint arXiv:2109.14756, 2021.
- D. Ruppert, “Efficient estimations from a slowly convergent robbins-monro process,” Cornell University Operations Research and Industrial Engineering, Tech. Rep., 1988.
- B. T. Polyak and A. B. Juditsky, “Acceleration of stochastic approximation by averaging,” SIAM journal on control and optimization, vol. 30, no. 4, pp. 838–855, 1992.
- S. Chen, A. Devraj, A. Bernstein, and S. Meyn, “Accelerating optimization and reinforcement learning with quasi stochastic approximation,” in 2021 American Control Conference (ACC). IEEE, 2021, pp. 1965–1972.
- R. Sutton, H. R. Maei, and C. Szepesvári, “A convergent o(n) temporal-difference algorithm for off-policy learning with linear function approximation,” in Advances in Neural Information Processing Systems 21, 2009.
- R. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, “Fast gradient-descent methods for temporal-difference learning with linear function approximation,” in Proceedings of the 26th International Conference On Machine Learning, ICML, vol. 382, 01 2009.
- H. R. Maei, C. Szepesvári, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, “Convergent temporal-difference learning with arbitrary smooth function approximation,” in Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009, p. 1204–1212.
- V. R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,” SIAM J. Control Optim., vol. 42, no. 4, 2003.
- T. Xu, Z. Wang, and Y. Liang, “Improving sample complexity bounds for actor-critic algorithms,” Available at: https://arxiv.org/abs/2010.00355, 2020.
- Y. Wu, W. Zhang, P. Xu, and Q. Gu, “A finite time analysis of two time-scale actor critic methods,” Available at: https://arxiv.org/abs/2005.01350, 2020.
- M. Hong, H.-T. Wai, Z. Wang, and Z. Yang, “A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic,” Available at: https://arxiv.org/abs/2007.05170, 2020.
- S. Khodadadian, T. T. Doan, S. T. Maguluri, and J. Romberg, “Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm,” Available at: https://arxiv.org/abs/2101.10506, 2021.
- S. Zeng, T. T. Doan, and J. Romberg, “Finite-time complexity of online primal-dual natural actor-critic algorithm for constrained markov decision processes,” in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022, pp. 4028–4033.
- M. Sayin, K. Zhang, D. Leslie, T. Basar, and A. Ozdaglar, “Decentralized q-learning in zero-sum markov games,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 320–18 334, 2021.
- S. Zeng, T. Doan, and J. Romberg, “Regularized gradient descent ascent for two-player zero-sum markov games,” Advances in Neural Information Processing Systems, vol. 35, pp. 34 546–34 558, 2022.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, p. 139–144, 2020.
- L. Mescheder, S. Nowozin, and A. Geiger, “The numerics of gans,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17, 2017, p. 1823–1833.
- A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in 5th International Conference on Learning Representations, ICLR, 2017.
- Q. Qian, S. Zhu, J. Tang, R. Jin, B. Sun, and H. Li, “Robust optimization over multiple domains,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4739–4746, Jul. 2019.
- Y.-F. Liu, Y.-H. Dai, and Z.-Q. Luo, “Max-min fairness linear transceiver design for a multi-user mimo interference channel,” IEEE Transactions on Signal Processing, vol. 61, no. 9, pp. 2413–2423, 2013.
- G. Lan, S. Lee, and Y. Zhou, “Communication-efficient algorithms for decentralized and stochastic optimization,” Mathematical Programming, vol. 180, pp. 237–284, 2020.
- T.-H. Chang, M. Hong, H.-T. Wai, X. Zhang, and S. Lu, “Distributed learning in the nonconvex world: From batch data to streaming and beyond,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 26–38, 2020.
- T. Lin, C. Jin, and M. Jordan, “On gradient descent ascent for nonconvex-concave minimax problems,” in Proceedings of the 37th International Conference on Machine Learning, vol. 119. PMLR, 13–18 Jul 2020, pp. 6083–6093.
- S. Lu, I. Tsaknakis, M. Hong, and Y. Chen, “Hybrid block successive approximation for one-sided non-convex min-max problems: Algorithms and applications,” IEEE Transactions on Signal Processing, vol. 68, pp. 3676–3691, 2020.
- J. Yang, N. Kiyavash, and N. He, “Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 1153–1165.
- Z. Xu, H.-L. Zhang, Y. Xu, and G. Lan, “A unified single-loop alternating gradient projection algorithm for nonconvex-concave and convex-nonconcave minimax problems,” ArXiv, vol. abs/2006.02032, 2020.
- J. Zhang, P. Xiao, R. Sun, and Z. Luo, “A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 7377–7389.
- T. Doan, “Convergence rates of two-time-scale gradient descent-ascent dynamics for solving nonconvex min-max problems,” in Learning for Dynamics and Control Conference. PMLR, 2022, pp. 192–206.
- A. Cherukuri, B. Gharesifard, and J. Cortés, “Saddle-point dynamics: Conditions for asymptotic stability of saddle points,” SIAM Journal on Control and Optimization, vol. 55, no. 1, pp. 486–511, 2017.
- T. T. Doan, C. L. Beck, and R. Srikant, “On the convergence rate of distributed gradient methods for finite-sum optimization under communication delays,” Proceedings ACM Meas. Anal. Comput. Syst., vol. 1, no. 2, pp. 37:1–37:27, 2017.
- T. T. Doan, S. T. Maguluri, and J. Romberg, “Convergence rates of distributed gradient methods under random quantization: A stochastic approximation approach,” IEEE on Transactions on Automatic Control, 2020.
- A. Reisizadeh, A. Mokhtari, H. Hassani, and R. Pedarsani, “Quantized decentralized consensus optimization,” in 2018 IEEE Conference on Decision and Control (CDC), 2018, pp. 5838–5843.
- M. M. Vasconcelos, T. T. Doan, and U. Mitra, “Improved convergence rate for a distributed two-time-scale gradient method under random quantization,” in 2021 60th IEEE Conference on Decision and Control (CDC), 2021, pp. 3117–3122.
- D. Romeres, F. Dörfler, and F. Bullo, “Novel results on slow coherency in consensus and power networks,” in Proc. of 2013 European Control Conference, 2013, pp. 742–747.
- J. Chow and P. Kokotovic, “Time scale modeling of sparse dynamic networks,” IEEE Transactions on Automatic Control, vol. 30, no. 8, pp. 714–722, 1985.
- E. Biyik and M. Arcak, “Area aggregation and time-scale modeling for sparse nonlinear networks,” Systems and Control Letters, vol. 57, no. 2, pp. 142–149, 2008.
- A. M. Boker, C. Yuan, F. Wu, and A. Chakrabortty, “Aggregate control of clustered networks with inter-cluster time delays,” in Proc. of 2016 American Control Conference, 2016, pp. 5340–5345.
- T. V. Pham, T. T. Doan, and D. H. Nguyen, “Distributed two-time-scale methods over clustered networks,” in 2021 American Control Conference (ACC). IEEE, 2021, pp. 4625–4630.
- A. Dutta, A. M. Boker, and T. T. Doan, “Convergence rates of distributed consensus over cluster networks: A two-time-scale approach,” in 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 7035–7040.
- A. Dutta, N. Masrourisaadat, and T. T. Doan, “Convergence rates of decentralized gradient dynamics over cluster networks: Multiple-time-scale lyapunov approach,” in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022, pp. 6497–6502.