Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation (2404.02378v1)
Abstract: We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialized to accelerated SGD under the strong growth condition. In this special case, our analysis reduces the dependence on the strong growth constant from $\rho$ to $\sqrt{\rho}$ as compared to prior work. This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.
- Zhang, H., Yin, W.: Gradient methods for convex minimization: Better rates under weaker conditions. arXiv preprint arXiv:1303.4645 (2013) Belkin et al. [2019a] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Belkin et al. [2019b] Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Belkin et al. [2019b] Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Belkin et al. [2019b] Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
- Aaron Mishkin (12 papers)
- Mert Pilanci (102 papers)
- Mark Schmidt (74 papers)